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Description 

The present invention relates generally to disaster 
recovery techniques in data processing systems, and 
more particularly to a system for real-time remote cop- 
ying of data. 

Data processing systems, in conjunction with 
processing data, typically are required to store large 
amounts of data (or records), which data can be effi- 
ciently accessed, modified, and re-stored. Data storage 
is typically separated into several different levels, or hi- 
erarchically, in order to provide efficient and cost effec- 
tive data storage. A first, or highest level of data storage 
involves electronic memory, usually dynamic or static 
random access memory (DRAM or SRAM). Electronic 
memories take the form of semiconductor integrated cir- 
cuits wherein millions of bytes of data can be stored on 
each circuit, with access to such bytes of data measured 
in nano-seconds. The electronic memory provides the 
fastest access to data since access is entirely electronic. 

A second level of data storage usually involves di- 
rect access storage devices (DASD). DASD storage, for 
example, can comprise magnetic and/or optical disks, 
which store bits of data as micrometer sized magnetic 
or optical altered spots on a disk surface for represent- 
ing the "ones* and "zeros" that make up those bits of the 
data. Magnetic DASD, includes one or more disks that 
are coated with remnant magnetic material. The disks 
are rotatably mounted within a protected environment. 
Each disk is divided into many concentric tracks, or 
closely spaced circles. The data is stored serially, bit by 
bit, along each track. An access mechanism, known as 
a head disk assembly (HDA), typically includes one or 
more read/write heads, and is provided in each DASD 
for moving across the tracks to transfer the data to and 
from the surface of the disks as the disks are rotated 
past the read/write heads. DASDs can store giga-bytes 
of data with the access to such data typically measured 
in milli-seconds (orders of magnitudes slower than elec- 
tronic memory). Access to data stored on DASD is slow- 
er due to the need to physically position the disk and 
HDA to the desired data storage location. 

A third or lower level of data storage includes tape 
and/or tape and DASD libraries. At this storage level, 
access to data is much slower in a library since a robot 
is necessary to select and load the needed data storage 
medium. The advantage is reduced cost for very large 
data storage capabilities, for example, tera-bytes of data 
storage. Tape storage is often used for back-up purpos- 
es, that is, data stored at the second level of the hierar- 
chy is reproduced for safe keeping on magnetic tape. 
Access to data stored on tape and/or in a library is pres- 
ently of the order of seconds. 

Having a back-up data copy is mandatory for many 
businesses as data loss could be catastrophic to the 
business. The time required to recover data lost at the 
primary storage level is also an important recovery con- 
sideration. An improvement in speed over tape or library 



back-up, includes dual copy. An example of dual copy 
involves providing additional DASD's so that data is writ- 
ten to the additional DASDs (sometimes referred to as 
mirroring). Then if the primary DASDs fail, the second- 
5 ary DASDs can be depended upon for data. A drawback 
to this approach is that the number of required DASDs 
is doubled. 

Another data back-up alternative that overcomes 
the need to provide double the storage devices involves 

io writing data to a redundant array of inexpensive devices 
(RAID) configuration. In this instance, the data is written 
such that the data is apportioned amongst many 
DASDs. If a single DASD fails, then the lost data can be 
recovered by using the remaining data and error correc- 
ts tion procedures. Currently there are several different 
RAID configurations available. 

The aforementioned back-up solutions are gener- 
ally sufficient to recover data in the event that a storage 
device or medium fails. These back-up methods are 

20 useful only for device failures since the secondary data 
is a mirror of the primary data, that is, the secondary 
data has the same volume serial numbers (VOLSERs) 
and DASD addresses as the primary data. System fail- 
ure recovery, on the other hand, is not available using 

2S mirrored secondary data. Hence still further protection 
is required for recovering data if a disaster occurs de- 
stroying the entire system or even the site, for example, 
earthquakes, fires, explosions, hurricanes, etc. Disaster 
recovery requires that the secondary copy of data be 

30 stored at a location remote from the primary data. A 
known method of providing disaster protection is to 
back-up data to tape, on a daily or weekly basis, etc. 
The tape is then picked up by a vehicle and taken to a 
secure storage area usually some kilometers away from 

35 the primary data location. A problem is presented in this 
back-up plan in that it could take days to retrieve the 
back-up data, and meanwhile several hours or even 
days of data could be lost, or worst, the storage location 
could be destroyed by the same disaster. A somewhat 

40 improved back-up method would be to transmit data to 
a back-up location each night. This allows the data to 
be stored at a more remote location. Again, some data 
may be lost between back-ups since back-up does not 
occur continuously, as in the dual copy solution. Hence, 

45 a substantial data amount could be lost which may be 
unacceptable to some users. 

More recently introduced data disaster recovery so- 
lutions include remote dual copy wherein data is 
backed-up not only remotely, but also continuously. In 

so order to communicate duplexed data from one host 
processor to another host processor, or from one stor- 
age controller to another storage controller, or some 
combination thereof, a substantial amount of control da- 
ta is required for realizing the process. A high overhead, 

55 however, can interfere with a secondary site's ability to 
keep up with a primary site's processing, thus threaten- 
ing the ability of the secondary site to be able to recover 
the primary in the event a disaster occurs. 
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Accordingly it is desired to provide a method and 
apparatus for providing a real time update of data con- 
sistent with the data at a primary processing location us- 
ing minimal control data, wherein the method and appa- 
ratus operates independently of a particular application 
data being recovered, that is, generic storage media 
based rather than specific application data based. 

An aim of the present invention is to provide an im- 
proved system and method for shadowing DASD data 
to a secondary site for disaster recovery. 

According to a first aspect of the present invention, 
a method for forming consistency groups provides for 
disaster recovery capability from a remote site. Data up- 
dates generated by one or more applications running in 
a primary processor are received by a primary storage 
subsystem, wherein the primary storage subsystem 
causes I/O write operations to write each data update 
therein. The primary storage subsystem is synchronized 
by a common timer, and a secondary system, remote 
from the primary processor, shadows the data updates 
in sequence consistent order such that the secondary 
site is available for disaster recovery purposes. Thee 
method comprising steps of: (a) time stamping each 
write I/O operation occurring in the primary storage sub- 
system; (b) capturing write I/O operation record set in- 
formation from the primary storage subsystem for each 
data update; (c) generating self describing record sets 
from the data updates and the respective record set in- 
formation, such that the self describing record sets are 
sufficient to re-create a sequence of the write I/O oper- 
ations; (d) grouping the self describing record sets into 
interval groups based upon a predetermined interval 
threshold; and (e) selecting a first consistency group as 
that interval group of self describing record sets having 
an earliest operational time stamp, the individual data 
updates being ordered within the first consistency group 
based upon time sequences of the I/O write operations 
in the primary storage subsystem. 

In another aspect of the present invention, a primary 
system has a primary processor running one or more 
applications, wherein the applications generating record 
updates, and the primary processor generating self de- 
scribing record sets therefrom. Each self describing 
record set is sent to a secondary system remote from 
the primary system, wherein the secondary system 
shadows the record updates in sequence consistent or- 
der based upon the self describing record sets for real 
time disaster recovery purposes. The primary processor 
is coupled to a primary storage subsystem wherein the 
primary storage subsystem receives the record updates 
and causes I/O write operations for storing each record 
update therein. The primary processor comprises a sys- 
plex timer for providing a common time source to the 
applications and to the primary storage subsystem for 
synchronization purposes, and a primary data mover, 
synchronized by the sysplex timer, prompts the primary 
storage subsystem for providing record set information 
to the primary data mover for each record update. The 



4 

primary data mover groups a plurality of record updates 
and each corresponding record set information into time 
interval groups, and inserts a prefix header thereto. 
Each time interval group forms the self describing record 
5 sets. 

A preferred embodiment of the invention will now 
be described, by way of example only, with reference to 
the accompanying drawings in which: 

FIG. 1 is a block diagram of a disaster recovery sys- 
10 tern having synchronous remote data shadowing capa- 
bilities. 

FIG. 2 is a flow diagram of a method for providing 
synchronous remote copy according to the disaster re- 
covery system of FIG. 1 . 
is FIG. 3 is a flow diagram of a method of an I/O error 
recovery program (I/O ERP) operation. 

FIG. 4 is a block diagram of a disaster recovery sys- 
tem having asynchronous remote data shadowing ca- 
pabilities. 

20 FIG. 5 is a data format diagram showing a prefix 
header that prefixes read record sets from the primary 
site of FIG. 4. 

FIG. 6 is a data format diagram describing fields 
making up a read record set. 
2S FIG. 7 is a state table identifying volume configura- 
tion information. 

FIG. 8 is a master journal as used by the secondary 
site of FIG. 4. 

FIG. 9 is an example sequence for forming a con- 
30 sistency group. 

FIG. 10 is a flow diagram showing a method of col- 
lecting information and read record sets for forming con- 
sistency groups. 

FIG. 11 is a flow diagram showing a method off orm- 
35 ing consistency groups. 

FIG. 12 is a table indicating full consistency group 
recovery rules application for an ECKD architecture de- 
vice for a given sequence of I/O operations to a DASD 
track.., 

^0 FIG. 13 is a description of the rules to be used in 
the table of FIG. 12. 

FIG. 1 4 is a flow diagram of a method of writing read 
record set copies to a secondary site with full consist- 
ency group recovery capability. 

45 a typical data processing system may take the form 
of a host processor, such as an IBM System/360 or IBM 
System/370 processor for computing and manipulating 
data, and running, for example, data facility storage 
management subsystem/multiple virtual systems (DF- 

so SMS/MVS) software, having at least one I BM 3990 stor- 
age controller attached thereto, the storage controller 
comprising a memory controller and one or more cache 
memory types incorporated therein. The storage con- 
troller is further connected to a group of direct access 

55 storage devices (DASDs) such as IBM 3380 or 3390 
DASDs. While the host processor provides substantial 
computing power, the storage controller provides the 
necessary functions to efficiently transfer, stage/ 
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de stage, convert and generally access large databases. 

Disaster recovery protection for the typical data 
processing system requires that primary data stored on 
primary DASDs be backed-up at a secondary or remote 
location. The distance separating the primary and sec- 
ondary locations depends upon the level of risk accept- 
able to the user, and can vary from several kilometers 
to thousands of kilometers. The secondary or remote 
location, in addition to providing a back-up data copy, 
must also have enough system information to take over 
processing for the primary system should the primary 
system become disabled. This is due in part because a 
single storage controller does not write data to both pri- 
mary and secondary DASD strings at the primary and 
secondary sites. Instead, the primary data is stored on 
a primary DASD string attached to a primary storage 
controller while the secondary data is stored on a sec- 
ondary DASD string attached to a secondary storage 
controller. 

The secondary site must not only be sufficiently re- 
mote from the primary site, but must also be able to 
back-up primary data in real time. The secondary site 
needs to back-up primary data as the primary data is 
updated with some minimal delay. Additionally, the sec- 
ondary site has to back-up the primary data regardless 
of the application program (e.g., IMS, DB2) running at 
the primary site and generating the data and/or updates. 
A difficult task required of the secondary site is that the 
secondary data must be order consistent, that is, sec- 
ondary data is copied in the same sequential order as 
the primary data (sequential consistency) which re- 
quires substantial systems considerations. Sequential 
consistency is complicated by the existence of multiple 
storage controllers each controlling multiple DASDs in 
a data processing system. Without sequential consist- 
ency, secondary data inconsistent with primary data 
would result, thus corrupting disaster recovery. 

Remote data duplexing falls into two general cate- 
gories, synchronous and asynchronous. Synchronous 
remote copy involves sending primary data to the sec- 
ondary location and confirming the reception of such da- 
ta before ending a primary DASD input/output (I/O) op- 
eration (providing a channel end (CE) and device end 
(DE) to the primary host). Synchronous copy, therefore, 
slows the primary DASD I/O response time while waiting 
for secondary confirmation. Primary I/O response delay 
is increased proportionately with the distance between 
the primary and secondary systems - a factor that limits 
the remote distance to tens of kilometers. Synchronous 
copy, however, provides sequentially consistent data at 
the secondary site with relatively little system overhead. 

Asynchronous remote copy provides better primary 
application system performance because the primary 
DASD I/O operation is completed (providing a channel 
end (CE) and device end (DE) to the primary host) be- 
fore data is confirmed at the secondary site. Therefore, 
the primary DASD I/O response time is not dependent 
upon the distance to the secondary site and the second- 



ary site could be thousands of kilometers remote from 
the primary site. A greater amount of system overhead 
is required, however, for ensuring data sequence con- 
sistency since data received at the secondary site will 
5 often not be in order of the primary updates. A failure at 
the primary site could result in some data being lost that 
was in transit between the primary and secondary loca- 
tions. 

Synchronous real time remote copy for disaster re- 

10 covery requires that copied DASD volumes form a set. 
Forming such a set further requires that a sufficient 
amount of system information be provided to the sec- 
ondary site for identifying those volumes (VOLSERs) 
comprising each set and the primary site equivalents. 

75 Importantly, the secondary site forms a "duplex pair* 
with the primary site and the secondary site must rec- 
ognize when one or more volumes are out of sync with 
the set, that is, failed duplex" has occurred. Connect 
failures are more visible in synchronous remote copy 

so than in asynchronous remote copy because the primary 
DASD I/O is delayed while alternate paths are retried. 
The primary site can abort or suspend copy to allow the 
primary site to continue while updates for the secondary 
site are queued, the primary site marking such updates 

25 to show the secondary site is out of sync. Recognizing 
exception conditions that may cause the secondary site 
to fall out of sync with the primary site is needed in order 
that the secondary site be available at any time for dis- 
aster recovery. Error conditions and recovery actions 

30 must not make the secondary site inconsistent with the 
primary site. 

Maintaining a connection between the secondary 
site and the primary site with secondary DASD present 
and accessible, however, does not ensure content syn- 

35 ch ran ism. The secondary site may lose synchronism 
with the primary site for a number of reasons. The sec- 
ondary site is initially out of sync when the duplex pair 
is being formed and reaches sync when an initial data 
copy is completed. The primary site may break the du- 

40 piex pair if the primary site is unable to write updated 
data to the secondary site in which case the primary site 
writes updates to the primary DASD under suspended 
duplex pair conditions so that the updating application 
can continue. The primary site is thus running exposed, 

45 that is, without current disaster protection copy until the 
duplex pair is restored. Upon restoring the duplex pair, 
the secondary site is not immediately in sync. After ap- 
plying now pending updates, the secondary site returns 
to sync. The primary site can also cause the secondary 

50 site to lose sync by issuing a suspend command for that 
volume to the primary DASD. The secondary site re- 
syncs with the primary site after the suspend command 
is ended, duplex pair is re-established, and pending up- 
dates are copied. On-line maintenance can also cause 

55 synchronization to be lost. 

When a secondary volume is out of sync with a pri- 
mary volume, the secondary volume is not useable for 
secondary system recovery and resumption of primary 
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applications. An oyt-of-sync volume at the secondary 
site must be identified as such and secondary site re- 
covery-takeover procedures need to identify the out-of- 
sync volumes for denying application access (forcing 
the volumes off-line or changing their VOLSERs). The 
secondary site may be called upon to recover the pri- 
mary site at any instant wherein the primary site host is 
inaccessible - thus the secondary site requires all perti- 
nent information about a sync state of all volumes. The 
secondary storage subsystem, that is the secondary 
storage controllers and DASD, is unable to determine 
all conditions causing the primary site to break synchro- 
nism due to primary site-encountered exceptions. For 
example, the primary site may break a duplex pair if the 
primary site is unable to access the secondary peer due 
to a primary i/O path or link failure that the secondary 
site is unaware of. In this case the secondary site shows 
in-sync state while the primary site indicates the duplex 
pair is broken. 

External communication may notify the secondary 
site that an out-of-sync duplex pair volume exists. This 
is realizable by employing a user systems management 
function. Primary I/O operations end with channel end/ 
device end/unit check (CE/DE/UC) status and sense da- 
ta indicates the nature of the error. With this form of I/O 
configuration an error recovery program (ERP) process- 
es the error and send an appropriate message to the 
secondary processor before posting the primary appli- 
cation that I/O is complete. The user is then responsible 
to recognize the ERP suspend duplex pair message and 
secure that information at the secondary location. When 
the secondary site is depended upon to become oper- 
ational in place of the primary site, a start-up procedure 
brings the secondary DASD on-line to the secondary 
host wherein sync status stored in the secondary DASD 
subsystem is retrieved for ensuring that out of -sync vol- 
umes are not brought on-line for application allocation. 
This sync status merged with all ERP suspend duplex 
pair messages gives a complete picture of the second- 
ary out-of-sync volumes. 

Referring now to FIG. 1 , a disaster recovery system 
10 is shown having a primary site 14 and a secondary 
site 1 5, wherein the secondary site 1 5 is located, for ex- 
ample, 20 kilo-meters remote from the primary site 14. 
The primary site 1 4 includes a host processor or primary 
processor 1 having an application and system I/O and 
Error Recovery Program 2 running therein (hereinafter 
referred to as I/O ERP 2). The primary processor 1 could 
be, for example, an IBM Enterprise Systems/9000 (ES/ 
9000) processor running DFSMS/MVS operating soft- 
ware and further may have several application pro- 
grams running thereon. A primary storage controller 3, 
for example, an IBM 3990 Model 6 storage controller, is 
connected to the primary processor 1 via a channel 12. 
As is known in the art, several such primary storage con- 
trollers 3 can be connected to the primary processor 1 , 
or alternately, several primary processors 1 can be at- 
tached to the primary storage controllers 3. A primary 



DASD 4, for example, an IBM 3390 DASD, is connected 
to the primary storage controller 3. Several primary 
DASDs 4 can be connected to the primary storage con- 
troller 3. The primary storage controller 3 and attached 

5 primary DASD 4 form a primary substorage system. Fur- 
ther, the primary storage controller 3 and the primary 
DASD 4 could be single integral units. 

The secondary site 1 5 includes a secondary proc- 
essor 5, for example, an IBM ES/9000, connected to a 

10 secondary storage controller 6, for example an IBM 
3990 Model 3, via a channel 13. A DASD 7 is further 
connected to the secondary storage controller 6. The 
primary processor 1 is connected to the secondary proc- 
essor 5 by at least one host-to-host communication link 

is 11, for example, channel links or telephone T1/T3 line 
links, etc. The primary processor 1 may also have direct 
connectivity with the secondary storage controller 6 by, 
for example, multiple Enterprise Systems Connection 
(ESCON) links 9. As a result, the I/O ERP 2 can com- 

20 municate, if required, with the secondary storage con- 
troller 6. The primary storage controller 3 communicates 
with the secondary storage controller 6 via multiple 
peer-to-peer links 8, for example, multiple ESCON links. 
When a write I/O operation is executed by an appli- 
es cation program running in the primary processor 1 , a 
hardware status channel end/device end (CE/DE) is 
provided indicating the I/O operation completed suc- 
cessfully. Primary processor 1 operating system soft- 
ware marks the application write I/O successful upon 

30 successful completion of the I/O operation, thus permit- 
ting the application program to continue to a next write 
I/O operation which may be dependent upon the first or 
previous write I/O operation having successfully com- 
pleted. On the other hand, if the write I/O operation was 

35 unsuccessful, the I/O status of channel end/device end/ 
unit check (hereinafter referred to as CE/DE/UC) is pre- 
sented to the primary processor 1 operating system soft- 
ware. Having presented unit check, the I/O ERP 2 takes 
control obtaining specific sense information from the pri- 

40 mary storage controller 3 regarding the nature of the 
failed write I/O operation. If a unique error to a volume 
occurs then a unique status related to that error is pro- 
vided to the I/O ERP 2. The I/O ERP 2 can thereafter 
perform new peer-to-peer synchronization error recov- 

45 ery for maintaining data integrity between the primary 
storage controller 3 and the secondary storage control- 
ler 6, or in the worst case, between the primary proces- 
sor 1 and the secondary processor 5. 

Referring to FIGs 2 and 3, the error recovery pro- 

50 cedure is set forth. In FIG. 2, a step 201 includes an 
application program running in the primary processor 1 
sending a data update to the primary storage controller 
3. At step 203 the data update is written to the primary 
DASD 4, and the data update is shadowed to the sec- 

55 ondary storage controller 6. At step 205 the duplex pair 
status is checked to determine whether the primary and 
secondary sites are synchronized. If the duplex pair sta- 
tus is in a synchronized state, then the data update is 
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written to the secondary DASD 7 at step 207 while 
processing then continues at the primary, processor 1 
via application programs running thereat. 

In the case that the duplex pair is in a failed" state, 
then at step 209 the primary storage controller 3 notifies 
the primary processor 1 that duplex pair has suspended 
or failed. The duplex pair can become failed* due to 
communication failure between the primary storage 
controller 3 and the secondary storage controller 6 via 
communication links 8. Alternatively, duplex pair can be- 
come failed" due to errors in either the primary or sec- 
ondary subsystem. If the failure is in the communication 
links 8, then the primary storage controller 3 is unable 
to communicate the failure directly to the secondary 
storage controller 6. At step 211 the primary storage 
controller 3 returns I/O status CE/DE/UC to the primary 
processor 1 . The I/O ERP 2 quiesces the application 
programs hence taking control of the primary processor 
1 at step 21 3 for error recovery and data integrity before 
returning control to the application requesting the write 
I/O operation. 

FIG. 3 represents steps performed by the I/O ERP 
2. The I/O ERP 2 issues a sense I/O to the primary stor- 
age controller 3 at step 221. The sense I/O operation 
returns information describing the cause of the I/O error, 
that is, the data description information is unique to the 
storage controllers or duplex pair operation regarding 
specific errors. In the event that the data description in- 
formation indicates that the peer-to-peer communica- 
tion links 8 have failed between the primary storage con- 
troller 1 and the secondary storage controller 6, then at 
step 223 the I/O ERP 2 issues a storage controller level 
I/O operation against the primary storage controller 3 
and the secondary storage controller 6 indicating that 
the affected volume is to be placed in failed synchronous 
remote copy state. This secondary storage controller 6 
is able to receive the state of the affected volume from 
the I/O ERP 2 via the - multiple ESCON links 9 or the 
host-to-host communication link 11. Consequently, the 
current status of the duplex pair operation is maintained 
at both the primary processor 1 and the secondary proc- 
essor 5 in conjunction with applications running in the 
primary processor 1 . Consoles 1 8 and 1 9 are provided 
for communicating information from the primary proces- 
sor 1 and secondary processor 4, respectively, wherein 
the I/O ERP posts status information to both consoles 
18 and 19. 

Data integrity has been maintained at step 225 up- 
on successful completion of the failed synchronous re- 
mote copy I/O operation to the primary storage control- 
ler 3 and the secondary storage controller 6. Therefore, 
if a recovery is attempted at the secondary site 15 the 
secondary storage controller 6 identifies the volume 
marked "failed synchronous remote copy" as not being 
useable until data on that volume are synchronized with 
other data in that synchronization group by data recov- 
ery means (conventional data base logs and/or journals 
for determining the state of that data on the volume). 



Step 227 tests to determine whether the I/O ERP 2 
received successful completion of the I/O operations at 
the primary storage controller 3 and the secondary stor- 
age controller 6 on the failed synchronous remote copy 

5 status update. Upon successful completion, the I/O ERP 
2 returns control to the Primary processor 1 at step 229. 
Otherwise step 231 performs a next level recovery no- 
tification which involves notifying an operator, via the 
console 18, of the failed volume and that a status of that 

10 volume at either the primary storage controller 3 or the 
secondary storage controller 6 may not be correct. The 
notification is shadowed to the secondary site 15, via 
the console 1 9 or a shared DASD data set, for indicating 
the specific volume status there. 

'5 An error log recording data set is updated at step 
233. This update is written to either the primary DASD 
4 or some other storage location and is shadowed to the 
secondary site 1 5. Having completed the error recovery 
actions, the I/O ERP 2, at step 235, posts to the primary 

20 application write I/O operation a "permanent error' for 
causing the primary application an error normal "perma- 
nent error" recovery for the failed write I/O operation. 
Once the error is corrected, the volume states can be 
recovered, first to pending (recopy changed data) and 

25 then back to full duplex. The data may later be re-applied 
to the secondary DASD 7 once duplex pair is re-estab- 
lished. 

When establishing a duplex pair a volume can be 
identified as CRITICAL according to a customer's 

30 needs. For a CRITICAL volume, when an operation re- 
sults in failing a duplex pair, a permanent error failure of 
the primary volume is reported irrespective of the actual 
error's location. With CRIT=Y, all subsequent attempts 
to write to the primary DASD 405 of the failed pair will 

35 receive a permanent error, ensuring that no data is writ- 
ten to that primary volume that cannot also be shadowed 
to the paired secondary volume. This permits complete 
synchronization with the primary application actions and 
the I/O data operations when required. 

•*o Consequently, the disaster recovery system 10 de- 
scribed herein, introduces outboard synchronous re- 
mote copy such that a primary host process error recov- 
ery procedure having an I/O order (channel command 
word (CC W)) may change a status of a primary and sec- 

45 ondary synchronous remote copy volume from duplex 
pair to failed duplex thereby maintaining data integrity 
for several types of primary and secondary subsystem 
errors. Storage based back-up, rather than application 
based back-up, wherein data updates are duplicated in 

so real time has been provided. The disaster recovery sys- 
tem 10 also attempts several levels of primary/second- 
ary status updates, including: (1 ) primary and secondary 
storage controller volume status updates; (2) primary 
and secondary host processor notification on specific 

55 volume update status via operator messages or error 
log recording common data sets; and (3) CRITICAL vol- 
ume indication, future updates to the primary volume 
can be prevented if the volume pair goes failed duplex. 



6 



11 



EP 0 672 985 B1 



12 



Hence, real time, full error disaster recovery is accom- 
plished.. 

Asynchronous remote data shadowing is used 
when it is necessary to further increase a distance be- 
tween primary and secondary sites for reducing the 
probability that a single disaster will corrupt both primary 
and secondary sites, or when primary application per- 
formance impact needs to be minimized. While the dis- 
tance between primary and secondary sites can now 
stretch across the earth or beyond, the synchronization 
of write updates across multiple DASD volumes behind 
multiple primary subsystems to multiple secondary sub- 
systems is substantially more complicated. Record write 
updates can be shipped from a primary storage control- 
ler via a primary data mover to a secondary data mover 
for shadowing on a secondary storage subsystem, but 
the amount of control data passed therebetween must 
be minimized while still being able to re-construct an ex- 
act order of the record write updates on the secondary 
system across several storage controllers as occurred 
on the primary system across multiple DASD volumes 
behind several storage controllers. 

FIG. 4 depicts an asynchronous disaster recovery 
system 400 including a primary site 421 and a remote 
or secondary site 431. The primary site 421 includes a 
primary processor 401, for example, an IBM ES/9000 
running DFS MS/MVS host software. The primary proc- 
essor 401 further includes application programs 402 
and 403, for example, IMS and DB2 applications, and a 
primary data mover (PDM) 404. A common sysplex 
clock 407 is included in the primary processor 401 for 
providing a common reference to all applications (402, 
403) running therein, wherein all system clocks or time 
sources (not shown) synchronize to the sysplex clock 
407 ensuring all time dependent processes are property 
timed relative to one another. The primary storage con- 
trollers 406, for example, synchronize to a resolution ap- 
propriate to ensure differentiation between record write 
update times, such that no two consecutive write I/O op- 
erations to a single primary storage controller 404 can 
exhibit the same time stamp value. The resolution, and 
not the accuracy, of the sysplex timer 407 is critical. The 
PDM 404, though shown connected to the sysplex timer 
407, is not required to synchronize to the sysplex timer 
407 since write I/O operations are not generated therein. 
A sysplex timer 407 is not required if the primary proc- 
essor 401 has a single time reference (for example, a 
single multi-processor ES/9000 system). 

A plurality of primary storage controllers 405, for ex- 
ample, IBM 3990 Model 6 storage controllers, are con- 
nected to the primary processor 401 via a plurality of 
channels, for example, fiber optic channels. Connected 
to each primary storage controller 405 is at least one 
string of primary DASDs 406, for example, IBM 3390 
DASDs. The primary storage controllers 405 and the pri- 
mary DASDs 406 form a primary storage subsystem. 
Each storage controller 405 and primary DASD 406 
need not be separate units, but may be combined into 



a single drawer. 

The secondary site 431 , located for example, some 
thousands of kilometers remote from the primary site 
421 , similar to the primary site 421 , includes a second- 

s ary processor 411 having a secondary data mover 
(SDM) 414 operating therein. Alternatively, the primary 
and secondary sites can be the same location, and fur- 
ther, the primary and secondary data movers can reside 
on a single host processor (secondary DASDs may be 

10 just over a fire-wall). A plurality of secondary storage 
controllers 41 5 are connected to the secondary proces- 
sor 411 via channels, for example, fiber optic channels, 
as is known in the art. Connected to the storage control- 
lers 415 are a plurality of secondary DASDs 416 and a 

'5 control information DASD(s) 417. The storage control- 
lers 41 5 and DASDs 416 and 41 7 comprise a secondary 
storage subsystem. 

The primary site 421 communicates with the sec- 
ondary site 431 via a communication link 408. More spe- 

20 cifically, the primary processor 401 transfers data and 
control information to the secondary processor 411 by 
a communications protocol, for example, a virtual tele- 
communications access method (VTAM) communica- 
tion link 408. The communication link 408 can be real- 
ms ized by several suitable communication methods, in- 
cluding telephone (T1 , T3 lines), radio, radio/telephone, 
microwave, satellite, etc. 

The asynchronous data shadowing system 400 en- 
compasses collecting control data from the primary stor- 

30 age controllers 405 so that an order of all data writes to 
the primary DASDs 406 is preserved and applied to the 
secondary DASDs 41 6 (preserving the data write order 
across all primary storage subsystems). The data and 
control information transmitted to the secondary site 

35 431 , must be sufficient such that the presence of the 
primary site 421 is no longer required to preserve data 
integrity 

The applications 402, 403 generate data or record 
updates, which record updates are collected by the pri- 

40 mary storage controllers 405 and read by the PDM 404. 
The primary storage controllers 405 each grouped its 
respective record updates for an asynchronous remote 
data shadowing session and provides those record up- 
dates to the PDM 404 via non-specific primary DASD 

45 406 READ requests. Transferring record updates from 
the primary storage controllers 405 to the PDM 404 is 
controlled and optimized by the PDM 404 for minimizing 
a number of START I/O operations and time delay be- 
tween each read, yet maximizing an amount of data 

so transferred between each primary storage controller 
405 and the primary processor 401 . The PDM 404 can 
vary a time interval between non-specific RE ADs to con- 
trol this primary storage controller-host optimization as 
well as a currency of the record updates for the second- 

55 ary DASDs 41 6. 

Collecting record updates by the, PDM 404, and 
transmitting those record updates to the SDM 41 4, while 
maintaining data integrity, requires the record updates 



7 



13 



EP 0 672 985 B1 



14 



to be transmitted for specific time intervals and in appro- 
priate multiple time intervals with enough control data to 
reconstruct the primary DASDs 406 record WRITE se- 
quence across atl primary storage subsystems to the 
secondary DASDs 416. Re-constructing the primary 
DASDs 406 record WRITE sequences is accomplished 
by passing self describing records from the PDM 404 to 
the SDM 414. The SDM 41 4 inspects the self describing 
records for determining whether any records for a given 
time interval have been lost or are incomplete. 

FIGs 5 and 6 show a journal record format created 
by the PDM 404 for each self describing record, includ- 
ing a prefix header 500 (FIG. 5), and a record set infor- 
mation 600 (FIG. 6) as generated by the primary storage 
controller 405. Each self describing record is further 
journaled by the SDM 414 for each time interval so that 
each self describing record can be applied in time se- 
quence for each time interval to the secondary DASDs 
416. 

Referring now to FIG. 5, the prefix header 500, 
which is inserted at the front of each record set, includes 
a total data length 501 for describing the total length of 
the prefix header 500 and actual primary record set in- 
formation 600 that is transmitted to the SDM 414 for 
each record set. An operational time stamp 502 is a time 
stamp indicating a start time for the operational set that 
the PDM 404 is currently processing. The operational 
time stamp 502 is generated by the PDM 404 (according 
to the sysplex timer 407) when performing a READ 
RECORD SET function to a set of the primary storage 
controllers 405. An I/O time 610 (FIG. 6) of the primary 
DASDs 406 write is unique for each primary storage 
controller 405 READ RECORD SET. The operational 
time stamp 502 is common across all storage control- 
lers. 

A READ RECORD SET command is issued by the 
PDM 404 and can be predicated upon one of the follow- 
ing conditions: 

(1) Primary storage controller 405 attention inter- 
rupt based upon that primary storage controller pre- 
determined threshold; 

(2) Primary processor 401 timer interrupt based up- 
on a predetermined time interval; or 

(3) Record set information indicates additional in- 
formation on outstanding record sets available but 
not yet read. 

Condition (2) uses a timer interval to control how far 
behind the secondary system 431 executes during pe- 
riods of low activity. Condition (3) occurs when the PDM 
404 fails to drain all record sets during a processing in- 
terval which drives further activity for ensuring that the 
PDM 404 keeps up with primary storage controller 405 
activity. 

A time interval group number 503 is supplied by the 



PDM 404 to identify a time interval (bounded by opera- 
tional time stamp 502 and a records read time 507) for 
which the current record sets belong (sets of records 
across all primary storage controllers 405 for a given 
s time interval group form consistency groups). A se- 
quence number within group 504 is derived based upon 
a hardware provided identification (to the PDM 404) of 
a write sequence order of application WRITE l/Os for 
primary storage controller 405 for each record set within 
10 a given time interval group 503. A primary SSID 
(substorage identification) 505 uniquely identifies the 
specific primary storage controller of the primary stor- 
age controllers 405 for each record set. A secondary tar- 
get volume 506 is assigned by either the PDM 404 or 
is the SDM 414 depending upon performance considera- 
tions. A records read time 507 supplies an operational 
time stamp that is common to all primary storage con- 
trollers 405 indicating an end time for the PDM 404 read 
record set process current interval. 

The operational time stamp 502 and the records 
read time 507 are used by the PDM 404 to group sets 
of read record sets from each of the primary storage 
controllers 405. Time synchronization for grouping sets 
of read record sets is key only to the PDM 404 and as 
such, the PDM 404 could be synchronized to a central 
processing unit (CPU) clock running only the PDM 404 
not attached to the syspelx timer 407. The PDM 404 
does not write record updates, but the record updates, 
as stated previously, must be synchronized to a com- 
mon time source. 

Referring now to FIG. 6, the record set information 

600 is generated by the primary storage controllers 405 
and collected by the PDM 404. Update Specific Infor- 
mation 601-61 0, includes a primary device unit address 

601 of each record indicating the actual primary DASD 
406 that the record update occurred on. A cylinder 
number/head number (CCHH) 602 indicates a location 
on primary DASD 406 for each record update. Primary 
SSID 603, the primary storage controller session iden- 
tifier js the same as primary SSID 505. Status flags 604 
provide status information regarding whether specific 
data records 620 follow. Sequence numbers 605 and 
630 assign a number to each record for indicating 
whether the entire record set has been read (all data 
transferred to the PDM 404). Primary DASD write I/O 
type 606 is an operation indicator identifying the type of 
write operation performed on each record, the operation 
indicators including: update write; format write; partial 
track records follow; full track data follows; erase com- 
mand performed; or write any performed. Search argu- 
ment 607 indicates initial positioning information for the 
first read record set data record 620. A sector number 
608 identifies that sector that the record was updated 
at. Count field 609 describes a number of specific record 
data fields 620 that follow. A host application time when 
the primary DASD 406 write update occurred is record- 
ed in time of updates 610. Specific record data 620 pro- 
vides a count/key/data (CKD) field of each record up- 
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date. Lastly, the sequence number 630 is compared to 
the sequence number 605 for indicating whether the en- 
tire read record set was transferred to the PDM 404. 

The update records are handled in software groups 
called consistency groups so that the SDM 41 4 can copy 
the record updates in the same order they were written 
at the primary DASDs 406. The information used for cre- 
ating the consistency groups (across all record sets col- 
lected from all storage controllers 405) includes the: op- 
erational time stamp 502; time interval group number 
503; sequence number within group 504; primary con- 
troller SSID 505; records read time 507; primary device 
address 601 ; the primary SSI D 603; and the status flags 
604. The information used for determining whether all 
records for a time interval group have been received for 
each storage controller 405 at the SDM 414 includes 
the: time interval group number 503; sequence number 
within group 504; physical controller ID 505; the primary 
SSID 603; and a total number of read record sets re- 
turned from each primary storage controller 405 for each 
operational time interval. The information necessary to 
place record updates on the secondary DASDs 416 
equivalents to the primary DASDs 406 record updates 
with full recover possible includes the: secondary target 
volume 506; CCHH 602; primary DASD write I/O type 
606; search argument 607; sector number 608; count 
609; time of updates 610; and the specific record data 
620. 

FIGs 7 and 8 show a state table 700 and a master 
journal 800, respectively, for describing a current journal 
contents, which simplifies recovery time and journal 
transfer time. The state table 700 provides configuration 
information, collected by and common to the PDM 404 
and SDM 414, and includes primary storage controller 
session identifiers (SSID numbers) and the volumes 
therein, and the corresponding secondary storage con- 
troller session identifiers and the corresponding vol- 
umes. Thus the configuration information tracks which 
primary volumes 710 or primary DASD extents map to 
secondary volumes 711 or secondary DASD extents. 
With a simple extension to the state table 700 indicating 
partial volume extents 71 2 (CCHH to CCHH), partial vol- 
ume remote copy can be accomplished using the same 
asynchronous remote copy methods described herein, 
but for a finer granularity (track or extent) than full vol- 
ume. 

The master journal 800 includes: consistency group 
number; location on journal volumes; and operational 
time stamp. The master journal 800 further maintains 
specific record updates as grouped in consistency 
groups. The state table 700 and master journal 800 sup- 
port disaster recovery, and hence must be able to oper- 
ate in a stand-alone environment wherein the primary 
system 401 no longer exists. 

A time stamp control is placed at the front and back 
of each master journal 800 to ensure that the entire con- 
trol entry was successfully written. The time stamp con- 
trol is further written to the secondary DASDs 417. The 
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control elements include dual entries (1 ) and (2), where- 
in one entry is always a current entry, for example: 

(1) Timestamp control I Control Info I Timestamp 
5 Control 

(2) Timestamp Control I Control Info I Timestamp 
Control. 

10 At any point in time either entry (1 ) or (2) is the cur- 
rent or valid entry, wherein a valid entry is that entry with 
equal timestamp controls at the front and back. Disaster 
recovery uses the valid entry with the latest timestamp 
to obtain control information. This control information, 

is along with state information (environmental information 
regarding storage controllers, devices, and applied con- 
sistency groups), is used for determining what record 
updates have been applied to the secondary storage 
controllers 415. 

20 After all read record sets across all primary storage 
controllers 405 for a predetermined time interval are re- 
ceived at the secondary site 431, the SDM 414 inter- 
prets the received control information and applies the 
received read record sets to the secondary DASDs 416 

25 in groups of record updates such that the record updates 
are applied in the same sequence that those record up- 
dates were originally written on the primary DASDs 406. 
Thus, all primary application order (data integrity) con- 
sistency is maintained at the secondary site 431. This 

30 process is hereinafter referred to as forming consistency 
groups. Forming consistency groups is based on the fol- 
lowing assumptions: (A) application writes that are in- 
dependent can be performed in any order if they do not 
violate controller sequence order; (B) application writes 

35 that are dependent must be performed in timestamp or- 
der, hence an application cannot perform a dependent 
write number two before receiving control unit end, de- 
vice end from write number one; and (C) a second write 
will always be either (1 ) in a same record set consistency 

40 group as a first write with a later timestamp or (2) in a 
subsequent record set consistency group. 

Referring to FIG. 9, an example of forming a con- 
sistency group (the consistency group could be formed 
at either the primary site 421 or secondary site 431 ), for 

45 example, for storage controllers SSID 1, SSID 2, and 
SSID 3 is shown (any number of storage controllers can 
be included but three are used in this example for clar- 
ity). Time intervals T1 , T2 and T3 are assumed to occur 
in ascending order. An operational time stamp 502 of 

so time interval T1 is established for storage controllers 
SSID 1, SSID 2 and SSID 3. The PDM 404 obtains 
record set data from storage controller SSIDs 1 , 2, and 
3 for time interval T1-T3. The record sets for SSIDs 1, 
2, and 3 for time interval T1 are assigned to time interval 

55 group 1, G1 (time interval group number 503). The se- 
quence number within .group 504 is shown for each 
SSID 1, 2, and 3, wherein SSID has three updates as 
11:59. 1 2:00, and 1 2:01 , SSI D 2 has two updates at 1 2: 
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00 and 12:02, and SSID 3 has three updates at 11:58, 
11 :59, and 12:02. Record sets of time intervals T2.and 
T3 are listed but example times ot updates are not given 
for simplicity. 

Consistency group N can now be formed based up- 5 
on the control information and record updates received 
at the secondary site 431. In order to ensure that no 
record update in time interval group number one is later 
than any record update of time interval group number 
two, a min-time is established which is equal to a the 10 
earliest read record set time of the last record updates 
for each storage controller SSID 1, 2, and 3. In this ex- 
ample then, min-time is equal to 1 2:01 . Any record up- 
dates having a read record set time greater than or equal 
to min-time is included in the consistency group N + 1 . is 
If two record update times to a same volume were equal, 
though unlikely given sufficient resolution of the sysplex 
timer 407, the record update having the earlier se- 
quence number within the time interval group N is kept 
with that group for consistency group N. The record up- 20 
dates are now ordered based upon read record set 
times. Record updates having equal times will cause the 
record update having the lower sequence number to be 
place before the later sequence numbered record up- 
date. Alternatively, record updates having equal time 25 
stamps, but to differing volumes, may be ordered arbi- 
trarily as long as they are kept in the same consistency 
group. 

If a primary storage controller 405 fails to complete 
a response to a read record set during a specified time 30 
interval, then a consistency group cannot be formed un- 
til that primary storage controller 405 completes. In the 
event that the primary storage controller 405 fails to 
complete its operation, then a missing interrupt results 
causing a system missing interrupt handler to receive 3S 
control and the operation will be terminated. On the oth- 
er hand, if the primary storage controller 405 timely com- 
pletes the operation then the I/O will be driven to com- 
pletion and normal operation will continue. Consistency 
group formation expects that write operations against 40 
the primary storage controllers 405 will have time 
stamps. Some programs, however, will cause writes to 
be generated without time stamps, in which case the pri- 
mary storage controller 405 will return zeros for the time 
stamp. Consistency group formation can bound those 4$ 
records without time stamps based upon the timestamp 
that the data was read. If too many record updates with- 
out time stamps occur over a time interval such that the 
record updates are not easily bounded by consistency 
group times, then an error that the duplex volumes are so 
out of synchronization may result. 

FIGs 10 and 11 are flow diagrams presenting the 
method of forming consistency groups. Referring to FIG. 
10, the process starts at step 1000 with the primary site 
421 establishing remote data shadowing to occur. At 55 
step 1010 all application I/O operations are time 
stamped using the sysplex timer 407 as a synchroniza- 
tion clock (FIG. 4). The PDM 404 starts a remote data 



shadowing session with each primary storage controller 
405 at step 1020 which, includes identifying those pri- 
mary volumes that will have data or records shadowed. 
Record set information 600 is trapped by the primary 
storage controllers 405 for each application WRITE I/O 
operation (see FIG. 6) by step 1030. 

Step 1040 involves the PDM 404 reading the cap- 
tured record set information 600 from each primary stor- 
age controller 405 according to a prompt including an 
attention message, a predetermined timing interval, or 
a notification of more records to read as described ear- 
lier When the PDM 404 begins reading record sets, at 
step 1050, the PDM 404 prefixes each record set with 
a prefix header 500 (see FIG. 5) for creating specific 
journal records (a journal record includes the prefix 
header 500 and the record set information 600). The 
journal records contain the control information (and 
records) necessary for forming consistency groups at 
the secondary site 431 (or at the primary site 421). 

At step 1060 the PDM 404 transmits the generated 
journal records to the SDM 41 4 via the communications 
link 408 (or within the same data mover system if the 
consistency groups are formed therein). The SDM 414 
uses the state table 700 at step 1070 to gather the re- 
ceived record updates by group and sequence numbers 
for each time interval group and primary storage con- 
troller 405 established for the data shadowing session. 
The SDM 414 inspects the journal records at step 1080 
to determine whether all record information has been 
received for each time interval group. If the journal 
records are incomplete, then step 1 085 causes the SDM 
414 to notify the PDM 404 to resend the required record 
sets. If the PDM 404 is unable to correctly resend, then 
the duplex volume pair is failed. If the journal records 
are complete, then step 1090 is performed which en- 
compasses the SDM 414 forming the consistency 
groups. 

Referring to FIG. 11, steps 1100-1160 representing 
step 1090 (FIG. 10) for forming consistency groups is 
shown. Consistency group formation starts at step 1100 
wherein each software consistency group is written to 
an SDM 414 journal log ("hardened") on the secondary 
DASD 41 7 (FIG. 4). Step 1110 performs a test for deter- 
mining whether the time interval groups are complete, 
that is, each primary storage controller 405 must have 
either presented at least one read record set buffer or 
have confirmation from the PDM 404 that no such record 
updates were placed in the record set buffer, and all read 
record set buffers with data (or null) must have been re- 
ceived by the SDM 41 4. If a time interval group is incom- 
plete, then step 1110 retries reading the record sets from 
the primary storage controller 405 until the required data 
is received. If errors occur, a specific duplex volume pair 
or pairs may be failed. Having received complete time 
interval groups, step 1 1 20 determines a first consistency 
group journal record. The first (or current) consistency 
group journal record is that record which contains the 
earliest operational time stamp 502 and the earliest time 
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of update 610 of all records having equal operational 
time stamps 502. 

Step 1 1 30 inspects the records contained in the cur- 
rent consistency group journal record to determine 
which record will be the last record to be included therein 
(some records will be dropped and included in the next 
consistency group journal record). The last record in the 
current consistency group journal record is determined 
as a minimum update time (min-time) of the maximum 
update times for each primary storage controller 405 
(that is, the last update of each primary storage control- 
ler 405 is compared and only the earliest of these re- 
mains in the current consistency group journal record). 

Those remaining record updates in the current con- 
sistency group journal record are ordered according to 
time of update 610 and sequence number within group 
504 by step 11 40. A primary storage controller 405 that 
had no record updates does not participate in the con- 
sistency group. At step 1150, the remaining record up- 
dates of the current consistency group (having update 
times later than min-time) are passed to the next con- 
sistency group. Each sequence number within a group 
504 should end with a null buffer indicating that all read 
record sets have been read for that operational time in- 
terval. If the null buffer is absent, then the step 1120 of 
defining the last record in the current software consist- 
ency group, coupled with the records read time 507 and 
time of update 61 0 can be used to determine the proper 
order of the application WRITE I/O operations across 
the primary storage controllers 405. 

Step 1 1 60 represents a back-end of the remote data 
shadowing process wherein specific write updates are 
applied to secondary DASDs 416 under full disaster re- 
covery constraint. If when writing the updates to the sec- 
ondary DASDs 416 an I/O error occurs, or the entire sec- 
ondary site 431 goes down and is re-initialized, then the 
entire consistency group that was in the process of be- 
ing written can be re-applied from the start. This permits 
the remote shadowing to occur without having to track 
which secondary DASDs 416 l/Os have occurred, which 
l/Os have not occurred, and which l/Os were in process, 
etc. 

A key component of step 1 1 60 is that the PDM 41 4 
causes the records to be written efficiently to the sec- 
ondary DASDs 416 so that the secondary site 431 can 
keep pace with the primary site 421. The requisite effi- 
ciency is accomplished, in part, by concurrently execut- 
ing multiple I/O operations to different secondary 
DASDs 416. Serially writing one secondary DASD 416 
at a time would cause the secondary site 431 to fall too 
far behind the primary site 421. Yet more efficiency is 
gained at the secondary site 431 by writing the records 
for each consistency group destined for a single sec- 
ondary device via a single channel command word 
(CCW) chain. Within each single CCW chain, the I/O 
operations therein can be further optimized as long as 
those I/O operations to each secondary DASD 41 6 data 
track are maintained in the order of occurrence on the 



primary volumes. 

Optimizing secondary I/O operations for specific 
consistency groups and within single CCW chains is 
based in part upon the pattern of primary write I/O op- 
5 erations, and in part upon the physical characteristics of 
the secondary DASDs 416. Optimization may vary 
somewhat depending upon whether secondary DASD 

415 is count/key/data (CKD), extended count/key/data 
(ECKD), fixed block architecture (FBA), etc. Conse- 

10 quently, a number of WRITE l/Os (m) to a primary DASD 
406 volume during a given time interval can be reduced 
to a single START I/O operation to a secondary DASD 

416 volume. This optimization of the number of START 
l/Os to the secondary storage controllers 41 5 of m: 1 can 

is allow the secondary DASDs 416 to catch up with and 
thereby closer shadow the record updates at the primary 
site 421. 

A key to successful remote data shadowing, and 
hence secondary I/O optimization, is minimizing unre- 

20 coverable errors in any of the concurrent multiple I/O 
operations to secondary DASDs 416 so that consistent 
copies are available for recovery. A failure in a given 
secondary write could permit a later dependent write to 
be recorded without the conditioning write (e.g., a log 

25 entry indicating that a data base record has been updat- 
ed when in reality the actual update write for the data 
base had failed violates the sequence integrity of the 
secondary DASD 416 copy). 

A failed secondary 416 copy is unusable for appli- 

30 cation recovery until that failure to update has been re- 
covered. The failed update could be corrected by having 
the SDM 41 4 request a current copy from the PDM 404. 
In the mean time the secondary data copy is inconsist- 
ent and hence unusable until the PDM 404 responds 

35 with the current update and all other previous updates 
are processed by the PDM 414. The time required to 
recover the failed update typically presents an unac- 
ceptably long window of non-recovery for adequate dis- 
aster recovery protection. 

*o Effective secondary site 431 I/O optimization is re- 
alized by inspecting the data record sets to be written 
for a given consistency group and building chains based 
upon rules of the particular secondary DASD 416 archi- 
tecture, for example, an ECKD architecture. The optimi- 

45 zation technique disclosed herein simplifies recovery 
from I/O errors such that when applying a consistency 
group, if an I/O error occurs, the CCW chain can be re- 
executed, or in the event of a secondary initial program 
load (I PL) recovery, the entire consistency group can be 

so re-applied without data loss. 

FIG. 12 summarizes full consistency group recov- 
ery (FCGR) rules for building CCW chains for all WRITE 
I/O combinations for an ECKD architecture, wherein 
CCHHR record format is used (cylinder number, head 

55 number, record number). FIG. 12 is created by inspect- 
ing each possible combination of WRITE I/O operations 
to a DASD track within a consistency group's scope. The 
FCGR rules of FIG. 1 2, described in FIG.s 1 3A and 1 3B, 
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are then followed to govern data placement (secondary 
DASD 416 I/O write CCW chains) for yielding full recov- 
ery for an error in applying a consistency group. The FC- 
GR rules depicted in FIG. 12 would be extended appro- 
priately as new WRITE I/O operations are added. These 
rules can exist in hardware or software at the secondary 
site 431. The FCGR rules advantageously reduces 
READ record set to a same DASD track analysis to an 
inspection of the primary DASD 406 WRITE I/O type, 
search argument, and count and key fields. 

If a DASD track is written without inspecting the con- 
sistency group write operations as shown in FIG. 12. 
then previously written data records potentially cannot 
be re-written. For example, assume that a chain in- 
cludes: 

WRITE UPDATE to record five; and 

FORMAT WRITE to record one, 

wherein record one and record five occur on the same 
DASD track with record one preceding record five. 
Record five is updated by an UPDATE WRITE CCW and 
a FORMAT WRITE I/O CCW updates record one eras- 
ing a remainder of the track thus deleting record five. If 
this chain had to be re-executed, a LOCATE RECORD 
CCW that will position to the beginning of record five will 
no longer have a positioning point (record five no longer 
exists), and the chain is not fully recoverable from the 
beginning. Since the write operations have already been 
successful at the primary site 421 , always being able to 
apply an entire consistency group on the secondary 
DASD 416 is required to maintain data consistency and 
integrity. 

FIG. 14, steps 1410 through 1470, provides more 
detail as to the process represented by step 1 1 60 of FIG. 
11, while using the FCGR rules defined in FIG. 12. At 
step 1410 the SDM 414 divides the records of the cur- 
rent consistency group into two categories. A first cate- 
gory includes I/O orders directed to a same secondary 
DASD volume, and a second category includes I/O or- 
ders of those records in the first category that are direct- 
ed to a same CCHH (i.e., records being updated to a 
same DASD track). 

Having categorized the records of the current con- 
sistency group, step 1420 conforms application WRITE 
l/Os and SDM 414 WRITE l/Os to the architecture of the 
secondary DASDs 416, for example, to ECKD architec- 
ture FCGR rules (see FiG. 1 2) for identifying data place- 
ment on a track and track/record addressing. The SDM 
414 groups secondary DASD WRITE I/O operations to 
the same volumes into single I/O CCW chains at step 
1430. Step 1440 involves moving the head disk assem- 
bly (HDA) of each secondary DASD 416 according to 
search arguments and specific record data (CKD fields) 
for the actual secondary DASD 416 writes. 

Step 1450 compares READ SET BUFFERS one 
and two for those records making up the second cate- 



gories (there typically will be a plurality of second cate- 
gories, one for each track receiving records), using the 
FCGR rules of FIG. 12 for determining whether a sub- 
sequent write operation invalidates a previous write op- 

5 eration or DASD search argument (positioning at a 
record that is now erased, etc.). The READ SET BUFF- 
ERS one and two contain adjacent read record sets. Fol- 
lowing the FCGR rules ensures that the SDM 414 can 
re-write an entire consistency group, in the event of an 

10 error, without re- receiving record updates from the pri- 
mary site 421. After the SDM 414 applies the current 
consistency group to the secondary DASD 416, step 
1460 updates the state table (FIG. 7) and the master 
journal (FIG. 8). 

15 The remote copy process continues in real time as 
step 1470 gets a next consistency group (which be- 
comes the current consistency group) and returns 
processing to step 1410. The remote copy process will 
stop if the primary site 421 to secondary site 431 com- 

20 munication terminates. The communication may termi- 
nate if volume pairs are deleted from the process by the 
PDM 404, the primary site is destroyed (disaster oc- 
curs), an orderly shutdown is performed, or a specific 
takeover action occurs at the secondary site 431. Con- 

25 sistency groups journaled on the secondary site 431 can 
be applied to the secondary DASD 41 6 during a takeo- 
ver operation. The only data lost is that data captured 
by the primary site 421 that has not been completely 
received by the SDM 414. 

30 in summary, synchronous and asynchronous re- 
mote data duplexing systems have been described. The 
asynchronous remote data duplexing system provides 
storage based, real time data shadowing. A primary site 
runs applications generating record updates, and a sec- 

3S ondary site, remote from the primary site, shadows the 
record updates and provides disaster recovery for the 
primary site. The asynchronous remote data duplexing 
system comprises a sysplex timer for synchronizing 
time dependent processes in the primary site, and a pri- 

40 rnary processor at the primary site for running the appli- 
cations, the primary processor having a primary data 
mover therein. A plurality of primary storage controllers 
are coupled to the primary processor for issuing write I/ 
O operations for each record update, each primary stor- 

45 age controller DASD write I/O operation being synchro- 
nized to the sysplex timer. A plurality of primary storage 
devices receive the write I/O operations and store the 
record updates therein accordingly. The primary data 
mover collects record set information from the plurality 

50 of primary storage controllers for each record update 
and appends a prefix header to a predetermined group 
of record set informations. The prefix header and pre- 
determined group of record set informations form the 
self describing record sets. Each record set information 

5S includes a primary device address, a cylinder number 
and head number (CCHH), a record update sequence 
number, a write I/O type, a search argument, a sector 
number, and a record update time. The prefix header 
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includes a total data length, an operational time stamp, 
a time interval group number, and a records read time. 
A secondary processor at the secondary site has a sec- 
ondary data mover, the secondary data mover receiving 
the self describing record sets from the primary site. A 
plurality of secondary storage controllers are coupled to 
the secondary processor, and a plurality of secondary 
storage devices are coupled to the secondary storage 
controllers for storing the record updates copies. The 
secondary data mover determines whether the transmit- 
ted self describing record sets are complete and forms 
consistency groups from the self describing record sets 
and provides the record updates from each consistency 
group to the plurality of secondary storage controllers 
for writing to the plurality of secondary storage devices 
in an order consistent with a sequence that the record 
updates were written to the plurality of primary storage 
devices. 

While the invention has been particularly shown 
and described with reference to preferred embodiments 
thereof, it will be understood by those skilled in the art 
that various changes in form and details may be made 
therein without departing from the scope of the invention 
as defined in the appended claims. For example, the 
consistency groups have been described as being 
formed by the secondary data mover based upon re- 
ceived self describing record sets, however, the consist- 
ency groups could be formed at the primary site based 
upon write record sets or elsewhere in the secondary 
site. The formats the storage devices at the primary and 
secondary sites need not be identical. For example, 
CKD records could be converted to fixed block architec- 
ture (FBA) type records, etc. Nor are the storage devices 
meant to be limited to DASD devices. 



Claims 

1. A method of providing asynchronous data duplex- 
ing, wherein data updates generated by one or 
more applications running in a primary processor 
are received by a primary storage subsystem, the 
primary storage subsystem causing I/O write oper- 
ations to write each data update therein, each write 
I/O operation being time stamped, the time stamps 
synchronized by a common timer, and wherein a 
secondary system, whether local to or remote from 
the primary processor, shadows the data updates 
in sequence consistent order such that the second- 
ary site is available for disaster recovery purposes, 
the method comprising the steps of: 

(a) time stamping each write I/O operation oc- 
curring in the primary storage subsystem; 

(b) capturing write I/O operation record set in- 
formation from the primary storage subsystem 
for each data update; 



(c) generating self describing record sets from 
the data updates and respective record set in- 
formations, the self describing record sets con- 
taining sufficient control information to enable 

5 recreation of a sequence of the write I/O oper- 

ations solely by the secondary system; 

(d) grouping the self describing record sets into 
interval groups, each interval group being 

10 measured from an operational time stamp start 

time and continuing for a predetermined inter- 
val threshold; and 

(e) selecting a current consistency group as 
15 that interval group of self describing record sets 

having an earliest operational time stamp, the 
individual data updates being ordered within 
the current consistency group based upon time 
sequences pf the I/O write operations in the pri- 
20 many storage subsystem. 

2. The method as claimed in claim 1 wherein the step 
(b) further includes initiating sessions with the pri- 
mary storage subsystem based upon the operation- 
's al time stamps for identifying a starting time for each 

interval group, each interval group being bounded 
by consecutive operational time stamps. 

3. The method as claimed in claim 1 or claim 2 wherein 
30 the step (d) includes adding a prefix header describ- 
ing each interval group. 

4. The method as claimed in any preceding claim fur- 
ther comprising a step (f) transmitting the interval 

35 groups of self describing record sets to the second- 
ary site . 

5. The method as claimed in claim 4 further compris- 
ing a step (g) determining at the secondary site 

40 whether each received self describing record set is 
complete. 

6. The method as claimed in claim 5 wherein the step 

(g) further includes the secondary site requesting 
45 the primary site to re-transmit any missing data up- 
dates if the secondary site determined a self de- 
scribing record set is incomplete. 

7. The method as claimed in claim 6 further compris- 
so jng a step (h) determining at the secondary site 

whether each time interval group is complete. 

8. The method as claimed in claim 7 wherein the step 

(h) further includes the secondary site requesting 
55 the primary site to re-send a missing record set if 

the secondary site determined that an interval 
group was incomplete. 
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9. The method as claimed in claim 8 further compris- 
ing a step (i) writing the received data updates at 
the secondary site to the secondary storage sub- 
system according to the sequence of the corre- 
sponding write I/O operations at the primary site as 5 
ordered in the consistency groups. 

10. A method for providing remote data shadowing for 
disaster recovery purposes in a data processing 
system including a primary site having a primary 10 
processor running a primary data mover and appli- 
cations generating record updates, the primary 
processor coupled to a primary storage subsystem 
having storage devices for storing the record up- 
dates according to write I/O operations issued by is 
the primary processor to the primary storage sub- 
system, the primary site further including a common 
system timer for synchronizing time dependent op- 
erations jn the primary site, the system further in- 
cluding a secondary site having a secondary proc- 20 
esspr communicating with the primary processor, 
and a secondary storage subsystem for storing cop- 
ies of the record updates in sequence consistent or- 
der, the method comprising the steps of: 

25 

(a) time stamping each write I/O operation in 
the primary storage subsystem; 

(b) establishing a session with each storage de- 
vice in the primary storage subsystem; 30 

(c) capturing record set information from each 
storage device in the primary storage subsys- 
tem; 

35 

(d) reading record sets and respective record 
set information in the primary data mover; 

(e) prefixing each record set with a header and 
creating self describing record sets therefrom; 40 

(f) transmitting the self describing record sets 
to the secondary processor in time interval 
groups according to predetermined time inter- 
vals; 45 

(g) forming consistency groups from the self de- 
scribing record sets; and 

(h) shadowing the record updates of each con- so 
sistency group to the secondary storage sub- 
system in a sequence consistent order. 

11. The method as claimed in claim 10 wherein the 
record sets are transmitted to the secondary proc- 55 
essor asynchronously. 

12. The method as claimed in claim 10 or claim 11 
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wherein the step (g) is performed at the secondary 
site. . . 

13. The method as claimed in any of claims 10 to 12 
wherein the step (f) further includes determining at 
the secondary site whether each received self de- 
scribing record set is complete. 

1 4. The method as claimed in claim 1 3 wherein the step 
(f) further includes requesting the primary site to re- 
transmit any missing record updates if the primary 
site determined a received self describing record 
set is incomplete. 

15. The method as claimed in any of claims 10 to 14 
further comprising a step (i) determining at the sec- 
ondary site whether each time interval group is 
complete. 

1 6. The method as claimed in claim 1 5 wherein the step 
(i) further includes requesting the primary site to re- 
send a missing record set if the secondary site de- 
termined that a interval group was incomplete. 

17. The method as claimed in any of claims 10 to 16 
wherein the step (c) identifies, in the record set in- 
formation, a physical location on the primary stor- 
age devices where each record update is stored. 

1 8. The method as claimed in claim 1 7 wherein the step 
(c) identifies, in the record set information, a se- 
quence and time of update of each record update 
stored to the primary storage devices within the ses- 
sion. 

19. The method as claimed in any of claims 10 to 18 
wherein the step (e) identifies, in the prefix header, 
an interval group number for the session and se- 
quence within group for each record update re- 
ferred to therein. 

20. A data processing system with a primary data 
processing system and a secondary data process- 
ing system, the primary data processing system 
having a primary processor running one or more ap- 
plications, the one or more applications generating 
record updates, the primary processor generating 
self describing record sets therefrom, the self de- 
scribing record sets being sent to the secondary 
system, the secondary system shadowing the 
record updates in sequence consistent order based 
upon the self describing record sets for real time dis- 
aster recovery purposes, the primary processor be- 
ing coupled to a primary storage subsystem where- 
in the primary storage subsystem receives the 
record updates and executes write I/O operations 
for storing each record update therein, the primary 
processor comprising: 
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a timer for providing a common time source to 
the applications and primary storage subsys- 
tem for synchronization purposes; and 

primary data mover means prompting the pri- s 
many storage subsystem for providing record 
set information to the primary data mover 
means for each record update, the primary data 
mover means grouping a plurality of record up- 
dates and each corresponding record set infor- 10 
mation into time interval groups, and inserting 
a prefix header thereto, each time interval 
group the self describing record sets, interval 
groups, and inserting a prefix header thereto, 
each time interval group the self describing 1$ 
record sets. 

21 . The primary system as claimed in claim 20 wherein 
the primary storage subsystem comprises: 

20 

a plurality of primary storage controllers, the 
plurality of primary storage controllers issuing 
the write I/O operations; and 

a plurality of primary storage devices coupled 25 
to the plurality of primary storage controllers. 

22. The primary system as claimed in claim 20 or claim 
21 wherein the primary data mover means collects 
record set information for each write I/O operation 30 
for each primary storage controller of the plurality 

of primary storage controllers participating with 
each time interval group. 

23. The primary system as claimed in any of claims 20 35 
to 22 wherein each write I/O operation is time- 
stamped in the primary processor relative to the 
sysplex timer, each write I/O operation being issued 

to a primary storage controller of the plurality of pri- 
mary storage controllers, each primary storage con- *o 
trailer preserving the time-stamp and returning that 
time-stamp in the corresponding read record set to 
the primary data mover means. 

24. The primary system as claimed in claim 23 wherein 45 
each record set information identifies a correspond- 
ing record update's physical location on a primary 
storage device of the plurality of primary storage de- 
vices. 

50 

25. The primary system as claimed in claim 23 or claim 
24 wherein each record set information identifies a 
corresponding record update's primary subsystem 
identification, primary device address, cylinder 
number, and head number. ss 

26. The primary system as claimed in claim 23 wherein 
the primary data mover means identifies a relative 



sequence of each write I/O update across all prima- 
ry storage controllers participating in a time interval 
group. 

27. The primary system as claimed in claim 26 wherein 
the primary data mover means creates a state table 
for journaling record updates and cross referencing 
a storage location of each record update on the pri- 
mary system and the secondary system. 

28. The primary system as claimed in claim 26 wherein 
the primary data mover means communicates the 
state table to the secondary system. 

29. A remote data shadowing system including a prima- 
ry site and a secondary site, the secondary site 
asynchronously shadowing record updates of the 
primary site in real time for disaster recovery pur- 
poses, the record updates generated by applica- 
tions running at the primary site, the primary site 
comprising: 

a sysplex timer; 

a primary processor running the applications 
generating the record updates and issuing a 
corresponding write I/O operation for each 
record update, the primary processor having a 
primary data mover therein; 

a plurality of primary storage controllers direct- 
ed to store the record updates, the plurality of 
primary storage controllers executing the is- 
sued write I/O operation for each record up- 
date; and 

a plurality of primary storage devices receiving 
and storing the record updates therein accord- 
ing to the corresponding write I/O operations, 

wherein the primary processor and each write 
I/O are time-stamped by the primary processor, 
as synchronized by the sysplex timer, such that 
write I/O operations are accurately sequence 
ordered relative to each other, the primary data 
mover collecting sets of record updates and 
combining each record set information as pro- 
vided by each one of the plurality of primary 
storage controllers with the corresponding 
record update, each record set information in- 
cluding a relative sequence and time of each 
corresponding write I/O operation, the primary 
data mover coHecting record updates into time 
interval groups and inserting a prefix header to 
each time interval group, wherein the prefix 
header includes information identifying the 
record updates included in each time interval 
group, each record set information and prefix 
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header combined for creating self describing 
record sets, the self describing record sets be- 
ing transmitted to the secondary site, wherein 
the self describing record sets provide informa- 
tion adequate for the secondary site to shadow 
the record updates therein in sequence consist- 
ent order without further communications from 
the primary site. 



Patentanspruche 

1 . Ein Verfahren zur Bereitstellung einer asynchronen 
Datenduplexierung, bei dem Datenaktualisierun- 
gen, die von einer Oder mehreren Anwendungen, 
die in einem primaren Prozessor ablaufen, erzeugt 
werden, von einem primaren Speichersubsystem 
empfangen werden, wobei das primare Speicher- 
subsystem E/A-Schreiboperationen dazu veranlas- 
sen, jede Datenaktualisierung dort hineinzuschrei- 
ben, wobei jede E/A-Schreiboperation zeitgestem- 
pelt wird, wobei die Zeitstempel von einem gemein- 
samen Timer synchronisiert werden und wobei ein 
sekundares System, unabhangig davon, ob es sich 
hierbei um ein lokales Oder ein vom primaren Pro- 
zessor entfemtes sekundares System handelt, die 
Datenaktualisierungen in Ubereinstimmung mit der 
Sequenz schattiert, so daB dieser sekundare Aus- 
fuhrungsort im Notfall zu Wiederherstellungszwek- 
ken verfugbar ist, wobei das Verfahren die folgen- 
den Schritte umfaBt: 

(a) Versehen jeder E/A-Schreibope ration, die 
im primaren Speichersubsystem auftaucht, mit 
einem Zeitstempel; 

(b) Festhalten der Aufzeichnungsinformatio- 
nen uber die E/A-Schreiboperation, die aus 
dem primaren Speichersubsystem kommen, 
fur jede Datenaktualisierung; 

(c) Erzeugen selbstbeschreibender Aufzeich- 
nungsgruppen anhandder Datenaktualisierun- 
gen und der dazugehorigen Aufzeichnungs- 
gruppeninformationen, wobei die selbstbe- 
schreibenden Aufzeichnungsgruppen ausrei- 
chend viele Steuerinfonmationen enthalten, um 
eine Wiederherstellung einer Sequenz der E/A- 
Schreiboperation allein durch das sekundare 
System zu ermoglichen; 

(d) Gruppieren der selbstbeschreibenden Auf- 
zeichnungsgruppen in Intervallgruppen, wobei 
jede Intervallgruppe ab einer Zeitstempel-Be- 
triebsanfangszeit und bis zu einem vorbe- 
stimmten Intervall-Schwellenwert gemessen 
wird; und 



(e) Auswahlen einer aktuellen Konsistenzgrup- 
pe als diejenige Intervallgruppe mit selbstbe- 
schreibenden Aufzeichnungsgruppen, die den 
fruhesten Betriebszeitstempel besitzt, wobei 
5 die einzelnen Datenaktualisierungen innerhalb 

der aktuellen Konsistenzgruppe auf der Grund- 
lage der Zeitsequenzen der E/A-Schreibopera- 
tionen im primaren Speichersubsystem sortiert 
werden. 

10 

2. Das Verfahren gemaB Anspruch 1, wobei der 
Schritt (b) weiterhin die Initiierung von Sitzungen 
mit dem primaren Speichersubsystem auf der 
Grundlage der Betriebszeitstempel zur Identifikati- 

'5 on einer Anfangszeit fur jede Intervallgruppe um- 
faBt, wobei jede Intervallgruppe durch aufeinander- 
folgende Betriebszeitstempel begrenzt ist. 

3. Das Verfahren gemaB Anspruch 1 oder 2, wobei der 
20 Schritt (d) weiterhin das Hinzufugen eines Prafixti- 

tels umfaBt, der jede Intervallgruppe beschreibt. 

4. Das Verfahren gemaB alien bisherigen Anspru- 
ch en, wobei weiterhin ein Schritt (f) enthalten ist, in 

2S dem die Intervallgruppen der selbstbeschreiben- 
den Aufzeichnungsgruppen an den sekundaren 
Ausfuhrungsort ubertragen werden. 

5. Das Verfahren gemaB Anspruch 4, wobei weiterhin 
30 ein Schritt (g) enthalten ist, in dem am sekundaren 

Ausfuhrungsort festgestellt wird, ob jede selbstbe- 
schreibende Aufzeichnungsgruppe vollstandig ist. 

6. Das Verfahren gemaft Anspruch 5, bei dem der 
3S Schritt (g) weiterhin einschlieGt, daft der sekundare 

Ausfuhrungsort vom primaren Ausfuhrungsort die 
emeute Ubertragung eventuell fehlender Datenak- 
tualisierungen anfordert, wenn der sekundare Aus- 
fuhrungsort festgestellt hat, daf) eine selbstbe- 
^0 schreibende Aufzeichnungsgruppe. unvotlstandig 
ist. 

7. Das Verfahren gemaB Anspruch 6, wobei weiterhin 
ein Schritt (h) enthaiten ist, in dem am sekundaren 

4S Ausfuhrungsort festgestellt wird, ob jede Zeitinter- 
vallgruppe vollstandig ist. 

8. Das Verfahren gemaB Anspruch 7, bei dem der 
Schritt (h) weiterhin einschlieBt, daB der sekundare 

so Ausfuhrungsort vom primaren Ausfuhrungsort die 
emeute Ubertragung einer eventuell fehlenden Auf- 
zeichnungsgruppe anfordert, wenn der sekundare 
Ausfuhrungsort festgestellt hat, daB eine Intervall- 
gruppe unvollstandig ist. 

55 

9. Das Verfahren gemaB Anspruch 8, wobei weiterhin 
ein Schritt (i) enthalten ist, in dem die empfangenen 
Datenaktualisierungen am sekundaren Ausfuh- 
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rungsort in Obereinstimmung mit der Sequenz der 
dazugehdrigen E/A-Schreiboperationen am prima- 
ren AusfOhrungsort gemaG Festlegung in den Kon- 
sistenzgruppen in das sekundare Speichersubsy- 
stem geschrieben werden. 5 

10. Ein Verfahren zur Bereitstellung einer Datenfern- 
schattierung zu Wiederherstellungszwecken in ei- 
nem Datenverarbeitungssystem in einem Notfall, 
wobei dieses Verfahren einen primaren AusfOh- 10 
rungsort mit einem primaren Prozessor, der einen 
primaren Datenbeweger und Anwendungen zur Er- 
zeugung von Aufzeichnungsaktualisierungen be- 
treibt, wobei der primare Prozessor an ein primares 
Speichersubsystem mit Speichermittetn zur Spei- '5 
cherung der Aufzeichnungsaktualisierungen in 
Obereinstimmung mit den E/A-Schreiboperationen, 
die vom primaren Prozessor an das primare Spei- 
chersubsystem ausgegeben werden, gekoppelt ist, 
wobei der primare AusfOhrungsort weiterhin einen 20 
gemeinsamen System-Timer zur Synchronisation 
zeitabhangiger Operationen am primaren AusfOh- 
rungsort umfaGt, wobei das System weiterhin einen 
sekundaren AusfOhrungsort mit einem sekundaren 
Prozessor umfaGt, der mit dem primaren Prozessor 25 
kommuniziert, und ein sekundares Speichersubsy- 
stem zur Speicherung von Kopien der Aufzeich- 
nungsaktualisierungen in der Reihenfolge der Se- 
quenz umfaGt, wobei dieses Verfahren die folgen- 
den Schritte umfaGt: 30 

(a) Versehen jeder E/A-Schreiboperation im 
primaren Speichersubsystem mit einem Zeit- 
stempel; 

. 35 

(b) Aufbau einer Sitzung mit jeder Speicherein- 
richtung im primaren Speichersubsystem; 

(c) Festhalten der Aufzeichnungsinf ormationen 
aus jeder Speichereinrichtung im primaren 40 
Speichersubsystem; 

(d) Lesen von Aufzeichnungsgruppen und ent- 
sprechenden Aufzeichnungsinformationen im 
primaren Datenbeweger; 45 

(e) Versehen jeder Aufzeichnungsgruppe mit 
einem Titel, und Erstellen selbstbeschreiben- 
der Aufzeichnungsgruppen daraus; 

50 

(f) Ubertragung der selbstbeschreibenden Auf- 
zeichnungsgruppen an den sekundaren Pro- 
zessor in Zeitintervallgruppen in Obereinstim- 
mung mit den vorbestimmten Zeit interval I en; 

55 

(g) Bildung von Konsistenzgruppen anhand der 
selbstbeschreibenden Aufzeichnungsgruppen; 
und 



(h) Schattierung der Aufzeichnungsaktualisie- 
rungen jeder Konststenzgruppe in das sekun- 
dare Speichersubsystem in der Reihenfolge 
der Sequenz. 

11. Das Verfahren gemaG Anspruch 10, bei dem die 
Aufzeichnungsgruppen asynchron an den sekun- 
daren Prozessor Obertragen werden. 

12. Das Verfahren gemaG Anspruch 10 Oder 11, bei 
dem der Schritt (g) am sekundaren AusfOhrungsort 
ausgefuhrt wird. 

13. Das Verfahren gemaG Anspruch 10 oder 12, wobei 
der Schritt (f) weiterhin die Bestimmung am sekun- 
daren AusfOhrungsort, ob jede empfangene selbst- 
beschreibende Aufzeichnungsgruppe vollstandig 
ist, umfaGt. 

14. Das Verfahren gemaG Anspruch 13, wobei der 
Schritt (f) weiterhin umfaGt, daG vom primaren Aus- 
fOhrungsort die erneute Ubertragung eventuell feh- 
tender Aufzeichnungsaktualisierungen angefordert 
wird, wenn der primare AusfOhrungsort festgestellt 
hat, daG eine empfangene selbstbeschreibende 
Aufzeichnungsgruppe unvollstandig ist. 

15. Das Verfahren gemaG jedem der Anspruche 10 bis 
14, das weiterhin einen Schritt (i) umfaGt, in dem 
am sekundaren AusfOhrungsort festgestellt wird, ob 
jede Zeitintervallgruppe vollstandig ist. 

16. Das Verfahren gemaG Anspruch 15, wobei der 
Schritt (i) weiterhin umfaGt, daG vom primaren Aus- 
fOhrungsort die erneute Ubertragung eventuell feh- 
lender Aufzeichnungsgruppen angefordert wird, 
wenn der sekundare AusfOhrungsort festgestellt 
hat, daG eine Intervallgruppe unvollstandig ist. 

17. Das Verfahren gemaG jedem der Anspruche 10 bis 
16, bei dem der Schritt (c) in den Aufzeichnungs- 
gruppeninformationen eine physikalische Position 
in den primaren Speichereinrichtungen angibt, wo 
jede Aufzeichnungsaktualisierung gespeichert ist. 

18. Das Verfahren gemaG Anspruch 17, wobei der 
Schritt (c) in den Aufzeichnungsgruppeninformati- 
onen in jeder Sitzung eine Sequenz und eine Zeit 
fur die Aktualisierung jeder Aufzeichnungsaktuali- 
sierung, die in den primaren Speichereinrichtungen 
gespeichert ist, angibt. 

19. Das Verfahren gemaG jedem der AnsprOche 10 bis 
18, bei dem der Schritt (e) im Prafixtitel eine Inter- 
vallgruppennummer fOr die Sitzung und fOr jede 
Aufzeichnungsaktualisierung, auf die darin Bezug 
genommen wird, die Sequenz innerhalb der Gruppe 
angibt. 
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20. Ein Datenverarbeitungssystem mit einem primaren 
Datenverarbeitungssystem und einem sekundaren 
Datenverarbeitungssystem, wobei das primare Da- 
tenverarbeitungssystem einen primaren Prozessor 
besitzt, der eine Oder mehrere Anwendungen aus- s 
fuhrt, und wobei diese eine oder mehrere Anwen- 
dungen Aufzeichnungsaktualisierungen erstellen, 
und wobei der primare Prozessor daraus selbstbe- 
schreibende Aufzeichnungsgruppen erzeugt, wo- 
bei diese selbstbeschreibenden Aufzeichnungs- io 
gruppen an das sekundare Datenverarbeitungssy- 
stem gesendet werden, das sekundare Datenver- 
arbeitungssystem die Aufzeichnungsaktualisierun- 
gen in Ubereinstimmung mit der Sequenz in der 
richtigen Reihenfolge auf der Grundlage der selbst- is 
beschreibenden Aufzeichnungsgruppen fur Echt- 
zeit-Wiederherstellungszwecke in einem Notfall 
schattiert, wobei der primare Prozessor an ein pri- 
mares Speichersubsystem gekoppelt ist, in dem 
das primare Speichersubsystem die Aufzeich- 20 
nungsaktualisierungen empfangt und E/A-Schreib- 
operationen zur Speicherung jeder Aufzeichnungs- 
aktualisierung darin ausf uhrt, und wobei der prima- 
re Prozessor foigendes umfaBt: 

25 

einen Timer zur Bereitstellung einer gemeinsa- 
men Zeitquelle an die Anwendungen und ein 
primares Speichersubsystem zu Synchronisa- 
tionszwecken; und 

30 

ein primares Datenbewegermittel, das vom pri- 
maren Speichersubsystem fur jede Aufzeich- 
nungsaktualisierung die Bereitstellung von Auf- 
zeichnungsgruppeninformationen an das pri- 
mare Datenbewegermittel anfordert, wobei das 35 
primare Datenbewegermittel eine Mehrzahl 
von Aufzeichnungsaktualisierungen und alle 
dazugehorigen Aufzeichnungsgruppeninfor- 
mationen in Zeitintervallgruppen einteilt und in 
jede der Aufzeichnungsgruppen informationen 40 
einen Prafixtitel einfugt, wobei jedes Zeitinter- 
vall die selbstbeschreibenden Aufzeichnungs- 
gruppen einteilt. 

21. Das primare Datenverarbeitungssystem gemaB *s 
Anspruch 20, in dem das primare Speichersubsy- 
stem foigendes umfaBt: 

eine Mehrzahl an primaren Speicher-Control- 
lern, wobei die Mehrzahl an primaren Speicher- so 
Controllern die E/A-Schreiboperationen ausge- 
ben; und 

eine Merhzahl an primaren Speichereinrichtun- 
gen, die an die Merhzahl der primaren Spei- ss 
cher-Control ler gekoppelt sind. 

22. Das primare Datenverarbeitungssystem gemaB 
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Anspruch 20 oder 21 , in dem das primare Datenbe- 
wegermittel Aufzeichnungsgruppeninformationen 
fur jede E/A-Schreiboperation fur jeden primaren 
Speicher-Controlleraus der Mehrzahl der primaren 
Speicher-Controller, die Bestandteil jeder Zeitinter- 
vallgruppe sind, sammelt. 

23. Das primare Datenverarbeitungssystem gemaB je- 
dem der Anspruche 20 bis 22, in dem jede E/A- 
Schreibope ration im primaren Prozessor bezugtich 
des Sysplex-Timers mit einem Zeitstempel verse- 
hen ist, wobei jede E/A-Schreiboperation an einen 
primaren Speicher-Controller aus der Mehrzahl der 
primaren Speicher-Controller ausgegeben wird, 
wobei jeder primare Speicher-Controller den Zeit- 
stempel bewahrt und ihn in der betreffenden Lese- 
aufzeichnungsgruppe an das primare Datenbeweg- 
ermittel zuruckgibt. 

24. Das primare Datenverarbeitungssystem gemaB 
Anspruch 23, in dem jede Aufzeichnungsgruppen- 
information die physikalische Position einer ent- 
sprechenden Aufzeichnungsaktualisierung in einer 
primaren Speichereinrichtung aus der Mehrzahl der 
primaren Speichereinrichtungen angibt. 

25. Das primare Datenverarbeitungssystem gemaB 
Anspruch 23 oder 24, in dem jede Aufzeichnungs- 
gruppen information die betreffende primare Subsy- 
stemidentifikation der Aufzeichnungsaktualisie- 
rung, die primare Gerateadresse, die Zylinderzahl 
und die Kopfzahl angibt. 

26. Das primare Datenverarbeitungssystem gemaB 
Anspruch 23, in dem das primare Datenbeweger- 
mittel eine relative Sequenz jeder E/A-Schreibak- 
tualisierung alter primaren Speicher-Controller an- 
gibt, die Bestandteil einer Zeitintervallgruppe sind. 

27. Das primare Datenverarbeitungssystem gemaB 
Anspruch 26, in dem das primare Datenbeweger- 
mittel eine Statustabelle zur Aufnahme von Auf- 
zeichnungsaktualisierungen erstellt und einen 
Querverweis zur Speicherstelle jeder Aufzeich- 
nungsaktualisierung im primaren Datenverarbei- 
tungssystem und im sekundaren Datenverarbei- 
tungssystem herstellt. 

28. Das primare Datenverarbeitungssystem gemaB 
Anspruch 26, in dem das primare Datenbeweger- 
mittel die Statustabelle an das sekundare Daten- 
verarbeitungssystem leitet. 

29. Ein Datenfernschattierungssystem, das einen pri- 
maren Ausfuhrungsort und einen sekundaren Aus- 
fuhrungsort umfaBt, wobei der sekundare Ausfuh- 
rungsort Aufzeichnungsaktualisierungen des pri- 
maren Ausfuhrungsortes zu Datenwiederherstel- 
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tungszwecken im Notfall in Echtzeit asynchron 
schattiert, wobei die von den Anwendungen er- 
zeugten Aufzeichnungsaktualisierungen am prima- 
ren Ausfuhrungsort ausgefOhrt werden, wobei der 
primare Ausfuhrungsort folgendes umfa(3t: s 

einen Syspiex-Timer;. 

einen primaren Prozessor, der die Anwendun- 
gen ausfuhrt, die die Aufzeichnungsaktualisie- 10 
rungen erzeugen, und der fur jede Aufzeich- 
nungsaktualisierung eine entsprechende E/A- 
Schreiboperation ausgibt, wobei der primare 
Prozessor einen primaren Datenbewegerdarin 
besitzt; is 

eine Mehrzahl primarer Speicher-Controller, 
die darauf ausgelegt sind, die Aufzeichnungs- 
aktualisierungen zu speichern, wobei die Mehr- 
zahl primarer Speicher-Controller die ausgege- 20 
bene E/A-Schreiboperation fur jede Aufzeich- 
nungsaktualisierung ausfuhrt; und 

eine Mehrzahl primarer Speichereinrichtun- 
gen, die entsprechend den jeweiligen E/A- 25 
Schreiboperationen die Aufzeichnungsaktuali- 
sierungen empfangen und speichern, 

wobei der primare Prozessor und jede E/A- 
Schreiboperation vom primaren Prozessor mit 30 
einem Zeitstempel versehen werden, der an- 
gibt, daG der Sysplex-Timer eine Synchronisa- 
tion durchgefuhrt hat, so daB E/A-Schreibope- 
rationen entsprechend ihrer Sequenz unterein- 
ander genau sortiert werden, wobei der primare 3S 
Datenbeweger Gruppen von Aufzeichnungs- 
aktualisierungen sammelt und jede Aufzeich- 
nungsgruppeninformation, die von jedem pri- 
maren Speicher-Controller bereitgestellt wird, 
mit der dazugehorigen Aufzeichnungsaktuali- *o 
sierung kombiniert, wobei jede Aufzeichnungs- 
gruppeninformation eine relative Sequenz und 
Zeit jeder betreffenden E/A-Schreiboperation 
umfaBt, wobei der primare Datenbeweger Auf- 
zeichnungsaktualisierungen sammeit und sie 4S 
in Zeitintervaligruppen einteilt und in jede Zeit- 
intervallgruppe einen Prafixtitel einfugt, wobei 
der Prafixtitel Informationen enthalt, die die 
Aufzeichnungsaktualisierungen identifizieren, 
die in jeder Zeitintervallgruppe, in jeder Auf- so 
zeichnungsgruppeninformation und in jedem 
Prafixtitel enthalten sind, und die so kombiniert 
werden, daB dadurch selbstbeschreibende 
Aufzeichnungsgruppen entstehen, wobei die 
selbstbeschreibenden Aufzeichnungsgruppen ss 
an den sekundaren Ausfuhrungsort ubertragen 
werden, an dem die selbstbeschreibenden Auf- 
zeichnungsgruppen Informationen liefern, mit 



denen der sekundare Ausfuhrungsort darin die 
Aufzeichnungsaktualisierungen in der richtigen 
Reihenfolge ohne weitere Kommunikation vom 
primaren Ausfuhrungsort schattieren kann. 

Revend (cat ions 

1. Precede de duplication asynchrone de donnees, 
dans lequel des mises a jour de donn6es, ayant ete 
generees par une ou plusieurs applications fonc- 
tionnant sur un processeur primaire, sont recues 
par un sous-systeme de stockage primaire, le sous- 
systeme de stockage primaire provoquant des ope- 
rations E/S d'ecriture, destinees a y ecrire chaque 
mise a jour de donnees, chaque operation E/S 
d'ecriture etant dotee d'un repere tempore!, les re- 
peres temporels etant synchronises avec un caden- 
ceurcommun, et dans lequel un systeme secondai- 
re, local ou distant vis-a-vis du processeur primaire. 
occulte une mise a jour de donnees dans un ordre 
sequentiellement coherent, de maniere que le site 
secondaire soit disponible a des fins de recupera- 
tion de secours de centres informatiques, le proce- 
de comprenant coherent a : 

(a) doter d'un repere temporel chaque opera- 
tion E/S d'ecriture se produisant dans le sous- 
systeme de stockage primaire; 

(b) capturer I'information du jeu d'enregtstre- 
ment d'operation E/S d'ecriture a parti r du 
sous-systeme de stockage primaire pour cha- 
que mise a jour de donnees; 

(c) generer lesdits jeux d'enregistrement auto- 
descriptifs a partir des mises a jour de donnees 
et des informations de jeux d'enregistrement 
respectives, les jeux d'enregistrement auto- 
descriptifs contenant une information de con- 
trole suffisante pour permettre la recreation 
d'une sequence d'operations E/S d'ecriture 
seulement par le systeme secondaire; 

(d) grouper les jeux d'enregistrement auto-des- 
criptifs en formant des groupes d'intervalles, 
chaque groupe d'intervalles etant mesure a 
partir d'un moment de depart du repere tempo- 
rel operationnel et continuant pendant un seuil 
d'intervalle predetermine; et 

(e) selectionner un groupe de coherence actuel 
comme groupe d'intervalles des jeux d'enregis- 
trement auto-descriptifs ayant un repere tem- 
porel operationnel de precocite maximale, les 
mises a jour de donnees individuelles etant 
classees dans le groupe de coherence actuelle 
selon les sequences temporelles des opera- 
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tions E/S d'ecriture dans ie sous-systemes de 
stockage primaire. 

2. Le procede selon la revendication 1 , dans lequel 
I'etape (b) comprend en outre le lancement de ses- s 
sions avec le sous-systeme de stockage primaire, 

en se basant sur les reperes tern pore Is operation - 
nels pour identifier un moment de depart pour cha- 
que groupe d'intervalles, chaque groupe d'interval- 
les etant lie par des reperes temporels operation- 10 
nets consecutifs. 

3. Le procede selon la revendication 1 ou la revendi- 
cation 2, dans lequel I'etape (d) comprend I'addition 
d'une en-tete de prefixe decrivant chaque groupe '5 
d'intervalles. 

4. Le procede selon Tune quelconque des revendica- 
tions precedentes, comprenant en outre une etape 

(f) de transmission des groupes d'intervalles des 20 
jeux d'enregistrement auto-descriptifs a destination 
du site secondaire. 



centres informatiques dans un systeme de traite- 
ment de donnees comprenant un site primaire 
ayant un processeur primaire faisant fonctionner un 
element de transfer! de donnees primaires et fai- 
sant fonctionner des applications generant des mi- 
ses a jour d'enregistrement, le processeur primaire 
etant couple a un sous-systeme de stockage pri- 
maire comportant des dispositifs de stockage des- 
tines a stocker les mises a jour d'enregistrements 
selon les operations E/S d'ecriture emises par le 
processeur primaire, a destination du sous-syste- 
me de stockage primaire, le site primaire compre- 
nant en outre un cadenceur de systeme commun 
destine a synchroniser les operations dependant du 
temps dans le site primaire, le systeme comprenant 
en outre un site secondaire ayant un processeur se- 
condaire qui communique avec le processeur pri- 
maire, et un sous-systeme de stockage secondaire 
destin§ a stocker des copies des mises a jour d'en- 
registrement dans un ordre de coherence de se- 
quences, le procede comprenant les etapes consis- 
tent ^ : 
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5. Le procede selon la revendication 4, comprenant 
en outre une etape (g) determinant au niveau du 
site secondaire si chaque jeu d'enregistrement 
auto-descriptif recu est complet. 

6. Le procede selon la revendication 5, dans lequel 
I'etape (g) comprend en outre la requete par le site 
secondaire du site primaire en vue de retransmettre 
toutes les eventuelles mises a jour de donnees 
manquantes, si le site secondaire a determine 
qu'un jeu d'enregistrement auto-descriptif eteit in- 
complet. 

7. Le procede selon la revendication 6, comprenant 
en outre une etape (h) determinant au niveau du 
site secondaire si chaque groupe d'intervalles de 
temps est complet. 

8. Le procede selon la revendication 7, dans lequel 
I'etape (h) comprend en outre la requdte par le site 
secondaire a destination du site primaire de reemis- 
sion d'un jeu d'enregistrement manquant si le site 
secondaire a determine qu'un groupe d'intervalles 
etait incomplet. 

9. Le procede selon la revendication 8, comprenant 
en outre une etape (i) d'ecriture des mises a jour de 
donnees recues au niveau du site secondaire sur 
le sous-systeme de stockage secondaire selon la 
sequence des operations E/S d'ecriture correspon- 
dantes au niveau du site primaire, tel que classe 
dans les groupes de coherence. 

10. Un procede pour fournir une occupation de don- 
nees distantes en vue d'effectuer un secours de 



(a) doter d'un repere temporel chaque opera- 
25 tion E/S d'ecriture dans le sous-systeme de 

stockage primaire; 

(b) etablir une cession avec chaque dispositif 
de stockage se trouvant dans le sous-systeme 

30 de stockage primaire; 

(c) capturer une information de jeu d'enregis- 
trement a partir de chaque dispositif de stocka- 
ge situe dans le sous-systeme de stockage pri- 

35 maire; 

(d) lire les jeux d'enregistrements et I'informa- 
tion de jeu d'enregistrement respective dans 
I'element de transfert de donnees primaires; 

40 

(e) doter chaque jeu d'enregistrement d'un pre- 
fixe avec un en-tete et creer a partir de ceia des 
jeux d'enregistrement auto-descriptifs; 

45 (f) transmettre les jeux d'enregistrement auto- 

descriptifs aux processeurs secondaires en for- 
mant des groupes d'intervalles de temps selon 
des intervalles de temps predetermines; 

50 (g) former des groupes de coherence a partir 

des jeux d'enregistrement auto-descriptifs; et 

(h) occulter les mises a jour d'enregistrement 
de chaque groupe de coherence sur ie sous- 
55 systeme de stockage secondaire, dans un or- 

dre de coherence de sequence. 

11. Le procede selon la revendication 10, dans lequel 
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les jeux d'enregistrement sont transmis de facon 
asynchrone au processeur secondaire. 

12. Le procede selon la revendication 10 ou la reven- 
dication 11 , dans lequel I'etape (g) est executee au 
niveau du site secondaire. 

13. Le procedd selon Tune quelconque des revendica- 
tions 10a 12, dans lequel i'etape (f) comprend en 
outre (a determination au niveau du site secondaire 
du fait que chaque jeu d'enregistrement auto-des- 
criptif ayant 6Xe recu est complet. 

14. Le procede selon la revendication 13, dans lequel 
I'etape (f) comprend en outre la requeue par le site 
primaire de retransmettre toutes les eventuelles mi- 
ses a jour d'enregistrement manquantes si le site 
primaire a determine que le jeu d'enregistrement 
auto-descriptif ayant ete recu 6tait incomplet. 

15. Le proced6 selon Tune quelconque des revendica- 
tions 10 a 1 4 comprenant en outre une 6tape (i) de- 
terminant au niveau du site secondaire si chaque 
groupe d'intervalles de temps est complet. 

16. Le precede selon la revendication 15 dans lequel 
I'etape (i) comprend en outre la requete par le site 
primaire de Remission d'un jeu d'enregistrement 
manquant si le site secondaire a determine qu'un 
groupe d'intervalles etait incomplet. 

17. Le procede selon i'une quelconque des revendica- 
tions 10 a 17 dans lequel I'etape (c) identifie, dans 
I'information de jeu d'enregistrement, sur les dispo- 
sitifs de stockage primaire, un emplacement physi- 
que auquel chaque mise a jour d'enregistrement est 
stockee. 

18. Le precede selon la revendication 17,, dans lequel 
I'etape (c) identifie, dans I'information de jeu d'en- 
registrement, une sequence et un moment de mise 
a jour de chaque mise a jour d'enregistrement ayant 
ete stockee sur les dispositifs de stockage primai- 
res se trouvant dans la session. 

19. Le procede selon I'une quelconque des revindica- 
tions 10 a 18, dans lequel I'etape (e) identifie, dans 
I'en-tete de prefixe, un nume>o de groupe d'interval- 
les pour la session et la sequence dans le groupe 
pour chaque mise a jour d'enregistrement y etant 
referee. 

20. Un systeme de traitement de donnees equipe d'un 
systeme de traitement de donnees primaire et d'un 
systeme de traitement de donnees secondaire, le 
systeme de traitement de donnees primaire com- 
portant un processeur primaire qui fait fonctionner 
une ou plusieurs applications, les unes ou plusieurs 
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applications generant des mises a jour d'enregis- 
trement, le processeur primaire generant a partirde 
cela des jeux d'enregistrement auto-descriptifs, les 
jeux d'enregistrement auto-descriptifs etant en- 
voyes au systeme secondaire, le systeme secon- 
daire occultant les mises a jour d'enregistrement 
dans un ordre de coherence de sequence en se ba- 
sant sur les jeux d'enregistrement auto-descriptifs 
a des fins de secours de centres informatiques en 
temps r6el, le processeur primaire etant couple a 
un sous-systeme de stockage primaire, dans lequel 
le sous-systeme de stockage primaire recoit les mi- 
ses a jour d'enregistrement et execute des opera- 
tions E/S d'ecriture pour y stocker chaque mise a 
jour d'enregistrement, le processeur primaire 
comprenant : 

un cadenceur pour fournir une source de temps 
commune aux applications et au sous-systeme 
de stockage primaire, a des fins de synchroni- 
sation; et 

des moyens de transf ert de donnees primaires, 
invitant le sous-systeme de stockage primaire 
a foumir une information de jeu d'enregistre- 
ment a des moyens de transf ert de donnees pri- 
maires pour chaque mise a jour d'enregistre- 
ment, les moyens de transfert de donnees pri- 
maires groupant une pluralite de mises a jour 
d'enregistrement et chaque information de jeu 
d'enregistrement correspondante en des grou- 
pes d'intervalle de temps, et ins6rant dans 
ceux-ci un en-t§te de prefixe, pour chaque 
groupe d'intervalles de temps que le jeux d'en- 
registrement auto-descriptifs fixe. 

21. Le systeme primaire selon la revendication 20, 
dans lequel le sous-systeme de stockage primaire 
comprend : 

une pluralite de contrdleurs de stockage primai- 
re, la pluralite de contrdleurs de stockage pri- 
maires emettant les operations E/S d'ecriture; 

une pluralite de dispositifs de stockage primai- 
res, couples a la pluralite des contrdleurs de 
stockage primaires. 

22. Le systeme primaire selon la revendication 20 ou la 
revendication 21 , dans lequel le moyen de transfert 
de donn6es primaire collecte I'information de jeu 
d'enregistrement pour chaque operation E/S d'ecri- 
ture pour chaque controleur de stockage primaire 
de la pluralite des contrdleurs de stockage primai- 
res participant a chaque groupe d'intervalles de 
temps. 

23. Le systeme primaire selon I'une quelconque des re- 
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vendications 20 a 22, dans lequel chaque operation 
E/S est dotee d'un repere tempore! dans te proces- 
ses primaire, par rapport au cadenceur sysplex, 
chaque operation E/S d'ecriture etant emise a des- 
tination d'un controleur de stockage primaire de la 5 
pluralite des contrdleurs de stockage primaires, 
chaque contrGIeur de stockage primaire preservant 
le repere temporel et retoumant ce repere temporel 
dans le jeu d'enregistrement de lecture correspon- 
dent, a destination du moyen de transfert de don- 10 
nees primaire. 

24. Le systeme primaire selon la revendication 23, 
dans lequel chaque information de jeu d'enregistre- 
ment identifie un emplacement physique de mise a *5 
jour d'enregistrement correspondant sur un dispo- 
sitif de stockage primaire appartenant a la pluralite 
des dispositifs de stockage primaires. 

25. Le systeme primaire selon la revendication 23 ou la 20 
revendication 24, dans lequei chaque information 

de jeu d'enregistrement identifie une identification 
de sous-systeme primaire de mise a jour d'enregis- 
trement correspondante, une adresse de dispositifs 
primaires, un numero de cylindre et un numero de 2s 
tete. 

26. Le systeme primaire selon la revendication 23, 
dans lequel le moyen de transfert de donnees pri- 
maire identifie une sequence relative de chaque mi- 30 
se a jour d'ecriture E/S dans la totalite des contrd- 
leurs de stockage primaire participants, dans un 
groupe d'intervalles de temps. 

27. Le systeme primaire selon la revendication 26, 3S 
dans lequel le moyen de transfert de donnees pri- 
maire cree une table d'etat destinee a enregistrer 
dans un journal des mises a jour d'enregistrement 

et a effectuer des references croisees d'un empla- 
cement de stockage de chaque mise a jour d'enre- *o 
gistrement sur le systeme primaire et le systeme se- 
condare. 

28. Le systeme primaire selon la revendication 26, 
dans lequel le moyen de transfert de donnees pri- *s 
maire communique la table d'etat au systeme se- 
condare. 

29. Un systeme d'occultation de donnees distante, 
comprenant un site primaire et un site secondaire, so 
le site secondaire occupant de facon asynchrone 
des mises a jour d'enregistrement du site primaire 

en temps reel, a des fins de secours de centres in- 
formatiques, les mises a jour d'enregistrement 
ayant ete generees par des applications fonction- ss 
nant au niveau du site primaire, le site primaire 
comprenant : 



un cadenceur sysplex; 

un processeur primaire faisant fonctionner les 
applications generant les mises a jour d'enre- 
gistrement et emettant une operation E/S 
d'ecriture correspondante pour chaque mise a 
jour d'enregistrement, le processeur primaire 
ayant en lui un element de transfert de donnes 
primaire; 

une pluralite de contrdleurs de stockage primai- 
res, ger6 pour memoriser les mises a jour d'en- 
registrement, la pluralite de contrdleurs de 
stockage primaires executant I'operation E/S 
d'ecriture emise pour chaque mise a jour d'en- 
registrement; et 

une pluralite de dispositifs de stockage primai- 
res, recevant et stockant les mises a jour d'en- 
registrement en leur sein, selon les operations 
E/S d'6criture correspondantes, 

dans lequei le processeur primaire et chaque 
operation E/S d'ecriture sont dotees d'un repe- 
re temporel par le processeur primaire, tel que 
synchronise par le cadenceur sysplex, de ma- 
niere que des operations E/S d'ecriture soient 
classees en sequences precises les unes par 
rapport aux autres, I'element de transfert de 
donnees primaires collectant des jeux de mises 
a jour d'enregistrement et combinant chaque 
information de jeu d'enregistrement, tel que 
fournie par chacun parmi la pluralite des con- 
trdleurs de stockage primaires, avec la mise a 
jour d'enregistrement correspondante, chaque 
information de jeu d'enregistrement compre- 
nant une sequence relative et le moment de 
chaque operation E/S d'ecriture correspondan- 
te, I'element de transfert de donnees primaire 
collectant les mises a jour d'enregistrement 
dans des groupes d'intervalles de temps et in- 
surant une en-tete de prefixe dans chaque 
groupe d'intervalles de temps, dans lequel I'en- 
tete de prefixe comprend une information iden- 
tifiant les mises a jour d'enregistrement inclu- 
ses dans chaque groupe d'intervalles de 
temps, chaque information de jeux d'enregis- 
trement et en-tdte de prefixe 6tant combines 
pour creer des jeux d'enregistrements auto- 
descriptifs, les jeux d'enregistrement auto-des- 
criptifs etant transmis au site secondaire, dans 
lequel les jeux d'enregistrement auto-descrip- 
tifs fournissent une information adequate pour 
le site secondaire afin d'occulter les mises a 
jour d'enregistrement s*y trouvant dans un or- 
dre de coherence de sequence, sans qu'il y ait 
d'autres communications depuis le site primai- 
re. 



22 



EP 0 672 985 B1 



FIG. 1 



18- 



14 



PRIMARY 


HOST 




i/o& 




ERP 



\} 2 



PRIMARY 
STORAGE 
CONTROLLER 




11 



9 




8 



10 



19 



/15 
5 



SECONDARY 
HOST 



13 



.z 



6 



SECONDARY 

STORAGE 
CONTROLLER 



SEC- 
DASD 



23 



EP 0 672 985 B1 



c 



200 



START 



3 



SEND DAI 
TO PRIMAR 
CONTF 


"A UPDATE 
Y STORAGE > 
10LLER 






WRITE DA 
TO PRIM- 
SEND TO SEC 


TA UPDATE 
ARY DASD, 
IONDARY DASD 



V 



201 



203 




YES 



207 



WRITE DATA UPDATE 
TO SECONDARY DASD 



209 



REPORT FAILED 
DUPLEX TO PRIMARY 
PROCESSOR 



PRIMARY STORAGE 
CONTROLLER REPORTS 
I/O STATUS CE/DE 
UNfT CHECK 



V 



211 



FIG. 2 



ERROR RECOVERY 
PROGRAM TAKES 
CONTROL 



V 



213 



24 



EP 0 672 985 B1 



221 



ERP ISSUES SENSE 
I/O TO PRIIMARY 
STORAGE CONTROLLER 



ERP ISSUES STORAGE 
CONTROLLER LEVEL I/O 
INDICATING FAILED 
SYNCHRONOUS REMOTE COPY 



223 



FIG. 3 



DATA INTEGRITY 
MAINTAINED 



225 



227 



SUCCESSFUL 
COMPLETION OF 
I/O OPERATIONS 
? 



YES 



229 



RETURN CONTROL 
TO PRIMARY 
PROCESSOR 



231 



NOTIFY OPERATOR OF 
FAILED SYNCHRONOUS 
REMOTE COPY 



UPDATE ERROR LOG 
RECORDING DATA SET 



233 



235 



POST PRIMARY 
APPLICATION WRITE 
I/O WITH PERMANENT 
ERROR 



25 



EP 0 672 985 B1 



PRIMARY 
HOST 
^401 




e 



407 



z 



402 



403 



APPLN 1 



I 



z 



APPLN 2 



DATA 
MOVER 
(PRIMARY) 




FIG. 4 



408 

SECONDARY 
HOST 
/ 411 

/ 



STORAGE 
CONTROLLERS 



405 



^ 421 

431 
415 

400 



414 



DATA 
MOVER 
(SECONDARY) 



t 



PRIMARY 
DASD 
106 



STORAGE 
CONTROLLERS 



SECONDARY 
DASD 
416 



CONTROL 
INFO 
DASD 



)ASD\ 
417 AS 



26 



EP 0 672 985 B1 



FIG. 5 



PREFIX HEADER 
500 



501 



502 



503 



504 



505 



506 



507 



TOTAL 
DATA 
LENGTH 


OPERA- 
TIONAL 
TIME 
STAMP 


TIME 
INTERVAL 
GROUP 
NUMBER 


SEQUENCE 
NUMBER 
WITHIN 
GROUP 


PRIMARY 
SSID 


SECON- 
DARY 
TARGET 
VOLUME 


RECORDS 
READ 
TIME 



27 



EP O 672 985 B1 



UK 



°- 3 Q c 

CO oc 



8 




I 

CO 



CD 

s 
z 



05 




9 Q 



£ 5 w uj 




I 



si 

CO 



g 

8 



X 
X 

8 



LU _ 



SEQUENCE I 

Ml IIIDCD 1 


5 

I 

r 


SPECIRC 


1 11/ 

DATA 


'I 
■ 8 


KEY/ 

[data J 








1 
1 
1 




\ 
o 

8 



o 

3 



CO 

CD 




J 



28 



EP 0 672 985 B1 



FIG. 7 



710 



712 



CONFIGURATION INFORMATION 


PRIMARY 


SECONDARY 


SSIO #1 


SSID #1' 


VOLUME 1 
^EXTENT (CCHH)L 
^ j (CCHH)H 


VOLUME 1* ■ " 

EXTENT (CCHH)L 
(CCHH)H 


VOLUME 2 
1 
I 

SSID #2 


VOLUME 2' 
I 
I 

SSID #2* 


VOLUME 1 


VOLUME 1" 


VOLUME 2 
I 
1 

I 
1 


VOLUME 2" JOURNAL 
1 RECORD 
, 1 LAST 
1 APPLIED ; 



STATE 
TABLE 
700 



CONSISTENCY GROUP NUMBER 
LOCATION ON JOURNAL 
OPERATIONAL TIME STAMP 



SPECIFIC RECORDS 
GROUPED BY "SOFTWARE 
CONSISTENCY GROUP" 



FIG. 8 



600 
MASTER 
JOURNAL 



29 



EP 0 672 985 B1 



FIG. 9 



PHYSICAL 
CONTROL- 
LER in 


Urfcrv 
ATIONAL 
TIME 
STAMP 


TIME 
INT 
unuur w 


READ RECORD SET 
TIME OF UPDATE / CONTROLLER 


SEQ. 
1 0F3 


SEQ. 
2 OF 3 


SEQ. 
3 OF 3 


SSID1 


Ti 

1 


1 


11:59 (?) 


12:00 (5) 


12:01 (?) 


SSID2 


1 1 


u 1 


12:00 Q) 


12:02(?) 




SSID3 


1 1 


Q 1 


11:58 (7) 


11:59 (3) 


* 

12:02 (5) 


SSI01 


T 2 


G 2 








SSID2 


T2 


Q 2 








SSID3 


T 2 


Q 2 










T 3 











CONSISTENCY GROUP #1 





11:58 




1 1:59 


© 


11:59 


© 


12:00 


© 


12:00 


© 


12:01 



EARUEST OPERATIONAL TIME T t . 
EARUEST TIME OF UPDATE ACROSS SSID, 
ORDERED IN READ SEQUENCE 



MIN TIME OF THE MAX TIMES OF 
UPDATE ACROSS ALL SSIDS 




30 



EP 0 672 985 B1 



FIG. 10 



1085 
\ 



f START j 

V 



1000 



TIME STAMP EACH 
APPLN I/O OPERATION 



START SESSION WITH EACH 
PRIMARY DASD VOLUME 



I 



CAPTURE RECORD 
SET INFORMATION 



READ CAPTURED RECORD 
SET INFORMATION 



PREFIX RECORDSETS 
WITH HEADER 



COMMUNICATE JOURNALED 
RECORDS TO SECONDARY 



I 



1010 



1020 

l/ 



1030 



1040 



1050 



1060 



1070 



GATHER RECORDS AT 
SECONDARY BY TIME INTERVALS 



REQUEST PRIMARY 
SITETORESEND 
MISSING RECORDS 



NO 




I 



RECORD SET 
INFO COMPLETE 
? 



LYES 




1080 



FORM CONSISTENCY 
GROUP 



1090 

V 



31 



EP 0 672 985 B1 



FIG. 1 1 



1115 



RETRY READING 
RECORD SETS FROM 
PRIMARY, ELSE FAIL 



C 



START 



NO 



<TIME INTERVAL 
GROUPS 
^COMPLETE? 




1110 



YES 



DETERMINE FIRST 
CONSISTENCY GROUP 
JOURNAL RECORD 



I 



DETERMINE LAST RECORD 
IN CURRENT CONSISTENCY 
GROUP (MIN-TIME) 



I 



ORDER REMAINING RECORDS 
BY TIME AND SEQUENCE 



PASS REMAINING RECORDS 
TO NEXT CONSISTENCY GROUP 



I 



WRITE RECORDS OF CURRENT 
CONSISTENCY GROUPS TO 
SECONDARY DASDS 



1120 

V 



1130 



U40 



1150 



1160 



32 



EP 0 672 985 B1 



CM 
(3 

LL 



C/3 
LU 
-J 

CC 
LU 

§ 

o 

LU 
GC 
Q_ 

r> 
O 
cc 
CD 

S3 

CO 

O 
o 































* 

LU 




GC 




i- 




* 

LU 








WRITE 
ANY 
KL = 0 




UJ 


GC 




h- 






« 

LU 






ERASE 
PARTIAL 






I— 






o 




$ 


ECORD SET BUFFER 




ERASE 
FULL 


Q 




GC 


u. 


QC 


in 


Ul 


LU 




FORMAT 
WRITE 
PARTIAL 


-5 






x 


CC 


2 






cc 




FORMAT 
WRITE 
FULL 


















READ 




Z 


z 


GC 


z 


cc 


Z 




5 




OPERATIO 


UPDATE 
WRITE 


* 
UJ 


3 




o 


1- 


CO 


« 
LU 






I/O WRITE 


UPDATE 
WRITE 
KL = 0 




UJ 


h- 


o 


1- 


CO 




LU 




LU 




















READ RECORD 
SET BUFFER #2 


6 




UPDATE WRITE 
KL = 0 


UPDATE WRITE 
KL*0 


FORMAT WRITE 
FULL 


FORMAT WRITE 
PARTIAL 


ERASE FULL 


ERASE PARTIAL 


WRITE ANY 
KL = 0 


>- 

z 
< 

LU O 

SE_j 



33 



EP 0 672 985 B1 



FIG. 13 




FIG. 13A 



B - IF #1 'S SEARCH ARG IS HIGHER THAN THE SEARCH ARG FOR #2 
THEN THROW #1 , ELSE DO BOTH. 

C -IF #1'S RECORD IS EQUAL TO OR HIGHER THAN THE FIRST RECORD 
IN #2, THEN THROW #1, ELSE DO BOTH. 

D - IF #2 IS UPDATING RO, THEN DO BOTH, ELSE ERROR. 

E - ERROR (SHOULD NEVER HAPPEN). 

E*- ERROR IF #1 AND #2 ARE THE SAME RECORD. (SHOULD 
NEVER HAPPEN WITHOUT A FORMAT WRITE IN BETWEEN). 

F - IF FiRST RECORD IN #2 IS R1, THEN WRITE BOTH, ELSE ERROR. 

G -IF #1 "S SEARCH ARG IS EQUAL TO OR HIGHER THAN THE 
SEARCH ARG FOR #2, THEN THROW #1, ELSE ERROR. 

H -IF #1 'S SEARCH ARG IS HIGHER THAN THE LAST RECORD 
FOR #2. THEN THROW #1. ELSE IF #2'S SEARCH ARG IS 
HIGHER THAN THE LAST RECORD IN #1 , THEN ERROR, ELSE 
WRITE BOTH. 



TO OPTIMIZE FURTHER, CAN DO THE FOLLOWING- INSTEAD: 
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IN #2) OR (THE LAST RECORD IN #1 IS EQUAL TO OR 
HIGHER THAN THE LAST RECORD IN #2) AND (2'S SEARCH 
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J - IF #2'S RECORD (OR SEARCH) IS HIGHER THAN THE LAST RECORD IN 
#1 , THEN ERROR, ELSE WRITE BOTH. 

K -IF #2'S RECORD (OR SEARCH) IS HIGHER THAN #1 *S SEARCH, THEN 
ERROR, ELSE WRITE BOTH. 

L -IF #1 'S SEARCH ARG IS EQUAL TO OR HIGHER THAN #2'S SEARCH 
ARG THEN EITHER WRITE BOTH OR OK TO THROW #1 , ELSE ERROR. 

M -IF (#1 'S SEARCH ARG IS EQUAL TO OR HIGHER THAN #2'S SEARCH 
ARG) THEN THROW 1 

ELSE IF (#2'S SEARCH ARG IS HIGHER THAN THE LAST RECORD IN #1) 
THEN ERROR 
ELSE WRITE BOTH. 

N -IF #2'S SEARCH ARG IS HIGHER THAN THE LAST RECORD IN #1 , 
THEN ERROR, ELSE WRITE BOTH. 

R -OK TO THROW #1. 
T -MUST THROW 1. 
W -WRITE BOTH. 

W*-IF #1 AND #2 HAVE THE SAME RECORDS, THEN THROW #1, ELSE 
DO BOTH OR MERGE RECORDS AND DO ONE WRITE. 
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